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Abstract 

In the thesis we take the split chain approach to analyzing Markov chains 
and use it to establish fixed-width results for estimators obtained via Markov 
chain Monte Carlo procedures (MCMC). Theoretical results include neces- 
sary and sufficient conditions in terms of regeneration for central limit the- 
orems for ergodic Markov chains and a regenerative proof of a CLT version 
for uniformly ergodic Markov chains with E n f 2 < oo. To obtain asymptotic 
confidence intervals for MCMC estimators, strongly consistent estimators 
of the asymptotic variance are essential. We relax assumptions required to 
obtain such estimators. Moreover, under a drift condition, nonasymptotic 
fixed-width results for MCMC estimators for a general state space setting 
(not necessarily compact) and not necessarily bounded target function / are 
obtained. The last chapter is devoted to the idea of adaptive Monte Carlo 
simulation and provides convergence results and law of large numbers for 
adaptive procedures under path-stability condition for transition kernels. 

Keywords and phrases: Markov chain, MCMC, adaptive Monte 
Carlo, split chain, regeneration, drift condition, (e— a)— approximation, 
confidence intervals, asymptotic confidence intervals, central limit 
theorem, law of large numbers 

AMS Subject Classification: 60J10, 60J05, 60F15, 60F05 
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Streszczenie 

W pracy przedstawione sq, rezultaty d0tycz3.ee estymacji staloprecyzyjnej 
dla algorytmow Monte Carlo opartych na laricuchach Markowa (MCMC). 
Podstawowa^ technika^ w analizie laricuchow Markowa i zwiajzanych z nimi 
procedur MCMC, jest laricuch rozszczepiony i regeneracja, co prowadzi do 
koniecznego i dostatecznego warunku w terminach regeneracji dla central- 
nego twierdzenia granicznego dla ergodycznych laricuchow Markowa. Do- 
datkowym rezultatem jest regeneracyjny dowod CTG dla jednostajnie er- 
godycznych laricuchow Markowa przy zalozeniu E w f 2 < oo. Aby otrzymac 
asymptotyczne przedzialy ufnosci za pomoca^ algorytmow MCMC konieczna 
jest m.in. mocno zgodna estymacja wariancji asymptotycznej . Oslabiamy 
znane zalozenia wymagane do konstrukcji takich estymatorow. Przy zaloze- 
niu warunku dryfu, ale bez zalozeri o ograniczonosci funkcji podcalkowej 
/ i zwartosci przestrzeni stanow, otrzymujemy nieasymptotyczna_ estymacja 
staloprecyzyjn^,. Ostatni rozdzial poswiexony jest procedurom adaptacyjnym, 
a uzyskane tarn wyniki dotycz^ce zbieznosci i prawa wielkich liczb zakladaja^ 
stabilnosc operatorow przejscia wzgl^dem trajektorii. 

Slowa kluczowe: laricuch Markowa, MCMC, adaptacyjne Monte 
Carlo, laricuch rozszczepiony, regeneracja, warunek dryfu, (e — 
a)— aproksymacja, przedzialy ufnosci, asymptotyczne przedzialy ufnosci, 
centralne twierdzenie graniczne, prawo wielkich liczb 

Klasyfikacja tematyczna wg. AMS: 60J10, 60J05, 60F15, 60F05 
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Chapter 1 
Introduction 



In this chapter we give some background for results presented in later chap- 
ters and introduce main ideas behind the thesis in an informal way. There- 
fore mathematical rigour will not always be our priority here. We start with 
defining the problem addressed by Markov chain Monte Carlo methods in 
Section 11.11 and proceed to describing typical sampling schemes and MCMC 
algorithms (the Metropolis algorithm and the Gibbs sampler) in Section [L2l 
Section 11.31 provides an overview of the results of the thesis. 



1.1 Markov Chain Monte Carlo 

Let X be a region in a possibly high-dimensional space, and let / be a real 
valued function on X . Moreover consider a probability distribution it with 
density p with respect to some standard measure dx, usually either Lebesque 
or counting measure, i.e. ir(dx) = p{x)dx. An essential part of many problems 
in Bayesian inference, statistical physics and combinatorial enumeration is 
the computation of analytically intractable integral 

I = EJ = 7rf= [ f{x)ir(dx), (1.1) 

J X 

where p and thus it is known up to a normalizing constant and direct simula- 
tion from Ti is not feasible (see e.g. [Casella fc Robert 1999] . |Liu, JS 2001| ). 



The common approach to this problem is to simulate an ergodic Markov chain 
(X n ) n > , using a transition kernel P, with stationary distribution n, which 
ensures the convergence in distribution of X n to a random variable from ir. 
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Thus, for a "large enough" t, X n for n > t can be considered as having 
distribution approximately equal to ir. Since a simple and powerful algo- 
rithm for constructing such a Markov chain has been introduced in 1953 by 
Metropolis et al. in the very seminal paper [Metropolis et al. 1953| , various 
sampling schemes and approximation strategies for estimating the unknown 
value of / have been developed and analyzed ( [Niemiro fc Pokarowski 2007) . 
|Liu, JS 2001| , [Casella fc Robert 1999| ). The method is referred to as Markov 



chain Monte Carlo (MCMC). 

To avoid problems with integrating functions with respect to probabil- 
ity distributions with unknown normalizing constants, Bayesian statisticians 
used to restrict attention to conjugate priors (see e.g. [Robert 1994] ) . This 
concept, although technically appealing, deprives the bayesian approach of 
flexibility which is one of its main strengths. Also, when building complex 
models with many parameters (as in the example of Section ??), even us- 
ing conjugate priors usually leads to intractable multidimensional posterior 
distributions. 

The invention of MCMC has transformed dramatically Bayesian infer- 
ence since it allows practitioners to sample from complicated posterior dis- 
tributions and to integrate functions with respect to these distributions. 
Thus Bayesian inference became a feasible and powerful approach for prac- 
titioners and now receives immense attention from the statistics community 
( [Roberts fc Rosenthal 2005] . [Casella fc Robert 1999] V 

In addition to their importance for applications, MCMC algorithms rise 
numerous questions related to Markov chains and probability. It is crucial 
to understand the nature and speed of convergence of the distribution of X n 
to ii as n — > oo. 



1.2 Sampling Schemes and MCMC Algorithms 

Before we proceed to the description of MCMC algorithms let us recall the 
independent Monte Carlo solution to the problem in (11.11) when simulating 
from Ti is feasible. In this case one takes i.i.d. random variables X iy . . . , X n ~ 
7r and estimates I by 

1 n 

= (1-2) 

• 1 

i=i 

Remark 1.2.1. Basic properties of the independent Monte Carlo estimation 
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are very easy to obtain. 

• If / exists then I n is its unbiased and (by the weak law of large numbers) 
consistent estimate. 

• Furthermore, if nf 2 < oo, then by the classical Central Limit Theorem 

yfr(f-T)*>N(0,*f a -(*/)>). 

• Confidence intervals for / can be obtained e.g. by the Chebyshev in- 
equality 

*f 2 - K) 2 



P(\ln-!\>S)< 



ne 



2 



provided that the variance nf 2 — (nf) 2 can be bounded a priori. 
Asymptotic confidence intervals can be derived from the CLT, 

y/ne 



P(\I n -I\>e)<2 




7T 



and effectively computed using a consistent estimate or an upper bound 

0f7T/ 2 -(7T/) 2 . 

Assume now the MCMC setting, where no efficient procedure for sam- 
pling independent random variables from ir is available. Let (X n ) n > be an 
ergodic Markov chain on X with transition kernel P and stationary limit- 
ing distribution ir. Let ir denote the initial distribution of the chain, i.e. 
X ~ 7r . The distribution of X t is it t = vr P* — > ir, but X ,Xi,... are 
dependent random variables and 01.21) is no longer an obvious and easy to 
analyze estimator. There are several possible strategies (cf. |Geyer 1992 



[Niemiro fc Pokarowski 2007j . |Chan fc Yue 1996] . |Liu, JS 2001] , |Casella fc Robert 1999] ). 



Estimation Along one Walk. Use average along a single trajectory of 
the underlying Markov chain and discard the initial part to reduce bias. 
In this case the estimate is of the form 



1 t+n-l 

lln = - /M (1-3) 



n 

i=t 



and t is called the burn-in time. 
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Estimation Along one Walk with Spacing. Discard the initial part of a 
single trajectory to reduce bias and then take every s— th observation 
to reduce correlation. In this case the estimate is of the form 



- t+n-l 

/*»- = - E -ft**) m 



n 

and s is called the spacing parameter. 

Multiple Run. Use average over final states of multiple independent 
runs of the chain. Thus we need first to simulate say n trajectories of 
length say t: 

A ,A X . .., A t , 



v (n) ^(n) ^(n) 
A ,A X A t , 

and for an estimate we take 

n 
m=l 

where m numbers the independent runs of the chain and t should be 
large enough to reduce bias. 

Median of Averages. Use median of multiple independent shorter runs. 
Here we simulate 

— Simulate m independent runs of length t + n of the underlying 
Markov chain, 

^-0 ) • • • ) ^-t+n-1) K—l,...,m. 

— Calculate m estimates of J, each based on a single run, 



t+n-l 

/ 1 



i=t 

For the final estimate take 



I = med{Ji, . . . ,I m }. 
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The one walk estimators are harder to analyze since both X t , . . . X t+n _i and 
Xta, ■ ■ -X(t+n-i)s are n °t independent, whereas x[ l \ . . . ,X j ! m ' ) are. Yet one 
walk strategies are believed to be more efficient and are usually the prac- 
titioners' choice. Some precise results comparing the first three estimators 
under certain assumptions are available and confirm the practitioners' intu- 
ition. We refer to them later. 

For each choice of estimation strategy additional questions arise, since one 
has to decide how to chose parameters t, n or t, n, s or t, n, m respectively, 
that assure "good quality of estimation". This choice must clearly depend on 
how one defines the desired "quality of estimation". 

Moreover, we see from the above that MCMC requires a Markov chain 
on X which is easily run on a computer, and which has ir as its stationary 
limiting distribution. It may be a bit surprising that there exist reasonably 
general recipes for constructing such a chain that converges to it in most 
settings of practical interest. 

1.2.1 The Metropolis Algorithm 

The Metropolis algorithm has been introduced by Metropolis et al. in [Metropolis et al. 1953| . 
Let Q be a transition kernel of any other Markov chain that is easily simu- 
lated on a computer. Recall that ir(-) has a density ir(dx) = p(x)dx, with 
possibly unknown normalizing constant. Let also Q(x, ■) have a density 
Q(x, dy) = q(x, y)dy. These densities are taken with respect to some a— finite 
reference measure dx, which typically is the Lebesgue measure on R d , how- 
ever other settings are possible, including counting measures on discrete state 
spaces. 

The Metropolis algorithm proceeds as follows. 

1. Draw X from an initial distribution 7r (typically 7r = 5 XQ for some 
x G X). 

2. Given X n draw a proposal Y n+1 from Q(X n , •). 
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Set 



X, 



n+l — 



X, 



n+l 



n 



with probability a(X n ,Y n+ i), 
with probability 1 — a(X n , Y n +i), 



where 
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(Also, set a(x, y) = 1 whenever p(x)q(x,y) = 0.) 



4. Replace n by n + 1 and go to 2. 

Note that one only has to compute the ratio of densities p(y)/p(x), and 
hence the unknown normalizing constant for it in the acceptance probability 
a(x, y) simplifies and one does not need to know it to run the chain. 

Choosing the proposal density is another question that arises when imple- 
menting the Metropolis algorithm and different ways of doing it lead to differ- 
ent classes of algorithms. Typical classes include (see e.g. [Roberts fc Rosenthal 2005] ) 

• Symmetric Metropolis Algorithm. In this case q(x,y) = q(y,x) and 
hence a(x, y) = min{l, ^M}. 

• Random Walk Metropolis-Hastings. In this case q(x,y) = q(y — x). 

• Independence Sampler. In this case the proposal does not depend on 
x, i.e. q(x,y) = q(y). 

• Langevin Algorithm. Where Q(X n ,-) = N(X n + {6/2)V log7r(X n ), 5) 
for some 5 > 0. 



1.2.2 The Gibbs Sampler 

The Gibbs Sampler is suitable in a setting where X is a product space. For 
simplicity we suppose in this section that X is an open subset of R d , and 
write x = (xi, . . . , Xj). 

The i— th component Pj of the Gibbs sampler P replaces x± by a draw 
from the conditional distribution 7r(xj|xi, . . . , . . . , Xa). 

To state it more formally let, similarly as in [Roberts fc Rosenthal 20 05J. 

S x ,i, a ,b = {y £ X; yj = xj for j ^ i, and a < y { < b}. 

And 

p, r, \ Ig P( x li ■ ■ ■ i Xj-h t> X i+li ■ ■ ■ i X d)dt 

-ii\X,O x i a hj poo . 1^1. DJ 

J_ oo p(xi, Xi-i, t, x i+1 , x d )dt 

Now the deterministic scan Gibbs sampler uses the transition kernel 

P = Pl P 2 ...p di (i.7) 



12 



i.e. updates the coordinates of X n in a systematic way, one after another, 
with draws from full conditional distributions. 

On the other hand the random scan Gibbs sampler choses a coordinate 
uniformly at random and performs its update, i.e. it uses the transition 
kernel 

P=\±P, (1.8) 

1=1 

In the example of Section 15.61 drawing from conditional distributions will 
be straightforward and in fact this is often the case for bayesian posterior 
distributions. However, if this step is infeasible, then instead of using Pi as 
defined in (11.61) . one performs one step of a Metropolis algorithm designed to 
update i — th coordinate. Such a procedure is then called Metropolis within 
Gibbs algorithm. 



1.3 Overview of the Results 

Existing literature on Markov chains and their applications to Markov chain 
Monte Carlo procedures is to large extent focused on obtaining bounds on 
convergence rates to the stationary distribution ( [Baxendale 2005| . |Douc et al. 2 003]. 
[Jones fc Hobert 2004j . [Roberts fc Tweedie 1999] . [Rosenthal 1995bj ) and on 
asymptotical results for MCMC estimators ( [Jones et al. 2006] . |Kipnis fc Varadhan 1 986 1, 
|Meyn &: Tweedie 1993 ]) . However, when analyzing MCMC estimators, re- 
sults on the rate of convergence to the stationary distribution allow only 
to keep bias in control and do not translate in a straightforward way into 
bounds on the mean square error or confidence intervals. Moreover, asymp- 
totic results may turn out useless in practice and may even be misleading 
( [Roberts fc Rosenthal 2005| ). 

The main goal of this thesis is to obtain fixed-width results for an esti- 
mator, say /, based on an MCMC algorithm. In particular we strive for the 
(e — a)— approximation, i.e. 

P{\i-I\ >e) <a, (1.9) 

where e is the desired quality of estimation and a is the confidence level. 

In analyzing Markov chains and estimators based on MCMC procedures 
we take the regenerative approach based on the split chain. The split chain 
construction allows to divide the Markov chain trajectory into independent 
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or 1— dependent blocks and turns out to be an extremely powerful technique 
with wide range of applications. The approach has been introduced indepen- 
dently in |Athreya fc Ney 1978| and [Nummelin 1978] and immensely devel- 



oped in [Nummelin 1984] and |Meyn fc TweedieT 993|. We give the basics of 
the approach in Chapter [2l 

Results related to (11.91) are known in literature for discrete state space X 
and bounded function / ( [Aldous 1987] . [Oillman 1998] . |Le6n fc Perron 2004| V 
For general state space X, and uniformly ergodic Markov chains (which in 
practice implies that X is compact) and bounded function /, exponential 
inequalities are available (due to |Glynn fc Ormoneit 2002 and an improved 



result due to |Kontoyiannis at al. 2005| ) thus (e — a)— approximation can be 
easily deduced. 

For a general, not necessarily compact, state space X (or equivalently, 
not uniformly ergodic chains) and unbounded function / (which is e.g. the 
case when computing bayesian estimators for a quadratic loss function) no 
nonasymptotic results of type fl 1 . 9f) are available. Fixed-width estimation is 
performed by deriving asymptotic confidence intervals based on 



1 n— 1 



i=0 



This construction requires two steps. First requirement is that a central limit 
theorem must hold, i.e. 

^^N(0,aj), (1.10) 

where o\ < oo is the asymptotic variance. The second step is to obtain 
a strongly consistent estimator o\ of aj. Recent paper [Jones et al. 2006] 
presents the state of the art approach to the problem. 

Results of Chapter [3] and Chapter [4] are related to this methodology. 
In Chapter [3j based on |Bednorz, Latala fc Latuszyhski 2008 



a neces- 



sary and sufficient condition in terms of regeneration for a central limit theo- 
rem for functionals of ergodic Markov chains (as defined in (11.101) have been 
obtained. It turns out, that the CLT holds if and only if excursions be- 
tween regenerations are square integrable. An additional result of Chapter 
[3] is a solution to the open problem posed in [Roberts &: Rosenthal 20 05J . 
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i.e. a regeneration proof of a CLT for uniformly ergodic Markov chains with 
E n f 2 < oo. 

Chapter [U based on |Bednorz &; Latuszyfiski 2007| , is devoted to relaxing 
assumptions for strongly consistent estimators of aj. Results of Chapter [4] 
improve the methodology of [Jones et al. 20061 . 

In Chapter [5] nonasymptotic results of type (11.91) are obtained for noncom- 
pact state space X and without assuming boundedness of the target function 
/• 

More precisely, the goal of this chapter is to analyze estimation along one 
walk 

1 t+n-l 
i=t 

of the unknown value I under the following drift condition towards a small 
set. 

(A.l) Small set. There exist C E B(X), (3 > and a probability measure v 
on (X, B{X)) such that for all x E C and A E B(X) 

P(x,A) > (3v(A). 



(A. 2) Drift. There exist a function V : X — > [1, oo) and constants A < 1 and 
K < oo satisfying 



(A. 3) Aperiodicity. There exists (3 > such that $v{C) > [3. 

Under this assumption we provide explicit lower bounds on the burn-in time t 
and the length of simulation n that guarantee (e — a)— approximation. These 
bounds depend only and explicitly on the estimation parameters e and a, 
drift parameters /3, [3, A, K and the the V— norm of the target function /, i.e. 
\P\ v = su Px f\x)/V(x). 

Moreover we analyze also estimation by the median of averages intro- 
duced in the previous section. It turns out that for small a sharper bounds 
on the total simulation cost needed for (e — a)-approximation are available 
in this case by a simple exponential inequality. 
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The results of Chapter [5] have been applied for Gibbs samplers for a 
Hierarchical Random Effects Model of practical interest enabling nonasymp- 
totic fixed-width analysis of this model. In particular this extends the results 
form [Jones fc Hobert 2004] . where burn in bounds in terms of total variation 
norm have been established for this model. 

Chapter [6] deals with a slightly different topic, namely adaptive proce- 
dures. The idea is to modify the transition kernel based on the information 
collected during the simulation. This usually leads to a stochastic process 
that are not Markov chains any more and are less tractable theoretically. 
On the other hand, an adaptive procedure at time n as allowed to make use 
of an additional information: the sample trajectory up to time n. Clearly 
the class of stochastic processes used for simulation is bigger. Thus a smart 
use of the idea may lead to improvements in estimation quality. Simula- 
tions confirm this expectations and numerical examples for numerous spe- 
cific algorithms outperform classical procedures [Roberts &: Rosenthal 20 06J . 
[Kohn fc Nott 2005] . An important example of the application of adaptive 
schemes is the Metropolis algorithm with multivariate normal proposal. In 
this case adaptation allows for automated choice of the covariance matrix for 
the proposal distribution [Atchade fc Rosenthal 2005) . Theoretical results on 
convergence and quality of estimation for adaptive procedures are very mod- 
est so far. Typical conditions that allow for investigation of convergence are 
called diminishing adaptation will be provided in Chapter [61 Time stability 
conditions for transition kernels assumed in ( [Atchade fc Rosenthal 20 05J . 
[Kohn fc Nott 2 005]) fit into the diminishing adaptation framework. Intu- 
itively time stability means that the adaptive process approaches a time 
homogeneous Markov chain. 

In Chapter [6] we prove two results a convergence rate theorem and a law of 
large numbers for adaptive schemes. For both results we assume a path sta- 
bility condition for transition kernels which is weaker then the time stability 
condition, assumed in [Atchade fc Rosenthal 2005] to prove similar results. 
The path stablity condition results from time stability condition by the tri- 
angle inequality and intuitively means that the adaptive process approaches 
a time in-homogeneous Markov chain. 
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Chapter 2 

Some Markov Chains 



In this chapter we give some basic definitions and facts about stationarity 
and ergodicity of Markov chains that justify the Metropolis algorithm and the 
Gibbs sampler of Section [L2l and provide grounds for the MCMC methodol- 
ogy. Next we outline the regeneration construction and the split chain and 
introduce typical objects and tools useful in for analyzing regenerative chains. 
Systematic, applications driven development of Markov chains theory via re- 
generation can be found e.g. in |Meyn h Tweedie 1993 and [Nummelin 1 984J 
that constitute an immense body of work. Hence we we do not attempt a sys- 
tematic treatment of the Markov chain theory here and this chapter, based on 
|Meyn fc Tweedie 1993| , |Nummelin 1984] . [Roberts fc Rosenthal 2005] and 
[Nummelin 2002| is nothing more then a place for notions and tools frequently 
used in later chapters. 



2.1 Stationarity and Ergodicity 

Although majority of the results we describe carry over to the setting where 
A" is a general set and B(X) is a countably generated a— algebra (see e.g. 
|Meyn fc Tweedie 1993| ), in our applications driven development we believe 
Polish spaces offer more then sufficient generality and a grat deal of "comfort". 
Thus, if not stated otherwise, the state space X shall be a Polish and B(X) 
shall denote the Borel a— algebra on X. A transition kernel P on (X, B(X)) 
is a map P : X x B(X) — > [0, 1], such that 

• for any fixed A G B(X) the function P(-, A) is measurable, 
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for any fixed x G X the function P(x, •) is a probability measure on 
(X , B{X)). 



For a probability measure [i and a transition kernel Q, by \iQ we denote 
a probability measure defined by 



We will also use E^g for /j,g, especially if // = 8 X we will write E x g. For 
transition kernels Qi and Q 2 , Q1Q2 is also a transition kernel defined by 



Let (X n ) n > denote a time homogeneous Markov chain on X evolving 
according to the transition kernel P, i.e. such that C(X n+ i\X n ) = P(X n , ■). 
By 7r denote the distribution of X , i.e. the initial distribution of the chain. 
Then, using the above notation the distribution of X n is 7r n = vr P n . In 
particular, if 7r = 8 X , then X n is distributed as 7r n = 5 x P n = P n (x,-). Clearly 
the behavior of n n is of our vital interest. 

We say that a probability distribution 7r is stationary for P, if nP = ir. 
A crucial notion related to stationarity via Proposition 12.1.21 is reversibility. 

Definition 2.1.1. A Markov chain on a state space X with transition kernel 
P is reversible with respect to a probability distribution 7r on X, if 




furthermore if g is a real- valued measurable function on X let 





and 






for all A,B £ B{X) 



we shall write equivalently 



7i(dx)P(x, dy) 



7f(dy)P(y,dx), for all x,yEX. 



18 



Proposition 2.1.2. If a Markov chain with transition kernel P is reversible 
with respect to it, then n is stationary for P. 

Proof. 

■kP(A) = I P{x,A)ix(dx) = I P(y,X)n(dy)= I ir(dy) = ir(A). 

J X J A J A 

□ 

It is straightforward to check that the acceptance probability a(x, y) of 
the Metropolis algorithm of Section [l.2.1l makes the procedure reversible with 
respect to 7r and thus it has 7r as its stationary distribution. 

Also the i— th component Pi of the Gibbs sampler of Section 11.2.21 is a 
special case of the Metropolis algorithm (with a(x,y) = 1) and hence it 
is stationary for Pi. This implies that the random scan Gibbs sampler is 
reversible and has tc as its stationary distribution. The deterministic scan 
Gibbs sampler usually is not reversible, however since n is stationary for each 
Pi, it is also stationary for P. 

Obviously stationarity is not enough for the applications in question since 
it does not even imply ir n — > n (see [Ro berts fc Rosenthal 2005] for exam- 
ples), not to mention justifying any of the estimation schemes (11.3H1.5I) . One 
needs some more assumptions and notions to investigate convergence of 7r n 
to Ti and properties of estimation strategies of previous sections. 

In particular the total variation distance is a very common tool to evaluate 
distance between two probability measures \i\ and /i 2 and is defined as 

||A*i-J*2|U= su P I A*i ( A ) - P2{A)\. (2.1) 

AeB(X) 

We shall distinguish between the two following types of convergence to it. 

lim ||P n (x, •) — ir\\ tv = 0, for ix— almost every x G X, (2.2) 
lim \\P n {x, ■) - n\\ tv = 0, for all x e X. (2.3) 

n— +oo 

0— irreducibility and aperiodicity are properties that guarantee convergence 
in (I2T21 . 

Definition 2.1.3. A Markov chain (X) n > with transition kernel P is 4>— irreducible 
if there exists a non-zero a— finite measure on X such that for all A C X 
with 4>(A) > 0, and for all x 6 X, there exists a positive integer n = n(x, A) 
such that P n (x, A) > 0. 
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Definition 2.1.4. A Markov chain (X)„> with transition kernel P and 
stationary distribution n is periodic with period d > 2 if there exist disjoint 
subsets X , . . . , Xd-x Q X such that tt(Xi) > and for all < i < d — 1, 
and for all x G Xi, P(x, X i+ i mo d d) = 1- And (i is maximal for the property. 
Otherwise the chain is called aperiodic. 

Theorem 2.1.5. If a Markov chain (X) n > with transition kernel P and 
stationary distribution 7r on a state space X is <p— irreducible and aperiodic, 
then (2.2) holds. 

Moreover, if a function f : X — > K is such that 7r(|/|) < oo, then a strong 
law of large numbers holds in the following sense 

^ n—l 

- Y] f(Xi) -»• 7r/, as n-^oo, w.p. 1. (2.4) 
n ' 

i=0 

The foregoing convergence result is one of many possible formulations. A 
proof of the first part can be found in [Roberts &: Rosenthal 2005] Section 4.6 
and the strong law of large numbers part results e.g. from Theorem 17.0.1 of 
|Meyn fc Tweedie~9 93|. Theorem 12.1.51 is widely applicable to MCMC algo- 
rithms. The Metropolis algorithm and the Gibbs samplers of Section [L2l are 
designed precisely so that ti is stationary. Also, it is usually straightforward 
to verify that the chain is aperiodic and ^—irreducible with e.g. being the 
Lebesgue measure or <p — 11 ■ 

The following example due to C. Geyer (cf. [Roberts fc Rosenthal 20 05J) 
provides a simple Markov chain that exhibits a "bad" behavior on a null set. 

Example 2.1.6. Let X = {1,2, . . .} and define transition probabilities by 
P(1,{1}) = 1, and for x > 2, let P(x,{l}) = 1/x 2 and P(x,{x + 1}) = 
1 — 1/x 2 . Then the chain is aperiodic and it = 5\ is the invariant distribution. 
The chain is also 7r— irreducible. However, if X = x > 2, then P(X n = 
x + n for all n) > 0, and ||P n (x, •) — 7r(-)|| 0. Thus the convergence holds 
only for x = 1 which in this case is 7r— a.e. x G X. 

To guarantee convergence for all x G X, as in ( 12.31) one needs to assume 
slightly more, namely Harris recurrence. 

Definition 2.1.7 (Harris Recurrence). A Markov chain (X n ) n ^ with tran- 
sition kernel P and stationary probability measure it is Harris recurrent if 
for all A G B(X), such that ir(A) > 0, and all x G X, the chain started at x 
will eventually reach A with probability 1, i.e. P(3n : X n G A\X = x) = 1. 
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Theorem 2.1.8. Ergodicity as defined in \2. 31) in equivalent to Harris re- 
currence and aperiodicity. 



The foregoing Theorem l2.1.8l results form Proposition 6.3 in [Nummelin 19 84J. 
Harris recurrent and aperiodic chains are often referred to as Harris ergodic. 

The speed of convergence in (12.2ft or (12.31) is another natural criterion for 
classifying chains. Geometrically ergodic and uniformly ergodic chains are of 
particular interest. 

Definition 2.1.9 (Uniform Ergodicity and Geometric Ergodicity). We say 
that a Markov chain (X n ) n > with transition kernel P and stationary distri- 
bution it is 

• geometrically ergodic, if ||P n (x, ■) — 7r(-)|| to < M(x)p n , for some p < 1 
and M(x) < oo 7r— almost everywhere, 

• uniformly ergodic, if ||P n (x, ■) — 7r(-)|| fo < Mp n , for some p < 1 and 

M < oo, 

The difference between geometric ergodicity and uniform ergodicity is 
that M may depend on the initial state x. Obviously, if a chain is geometri- 
cally ergodic and M(x) is a bounded function, then the chain is also uniformly 
ergodic. In particular, if the state space is finite, then every geometrically 
ergodic Markov chain is uniformly ergodic. (And from the standard theory 
of discrete state space Markov chains we know that every ergodic chain is 
uniformly ergodic.) Verifying uniform or geometric ergodicity is in general 
nontrivial and we will refer to it later. An interesting result for the algo- 
rithms presented in Chapter Q] is for example that a symmetric random- walk 
Metropolis algorithm is geometrically ergodic if and only if ir has finite ex- 
ponential moments, as shown in |Mengersen fc Tweedie 1 996 1. 

Since in the sequel we deal with integrals of unbounded functions / with 
respect to probability measures, the very common total variation distance 
defined by (12.11) is in this case inappropriate for measuring distances between 
probability measures and we need to introduce the V— norm and V— norm 
distance. 

Let V : X — ► [1, oo) be a measurable function. For measurable function 
g : X — > R define its V-norm as 

i | \9{x)\ 

\9\v ■= SU PT7TT- 
xe x V(x) 
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To evaluate the distance between two probability measures \i\ and p,2 we use 
the V-norm distance, defined for probability measures pi and P2 as 



Wi ~ M2 1| v := sup {fag - p 2 9\ ■ 
\g\<v 

Note that for V = 1 the V— norm distance || • \\y amounts to the total 
variation distance, i.e. \\p\ — faWv — ^ sw PAeB(x) l/^iC^) ~~ A^C^)! = 2||pi — 
1 1 *u - Finally for two transition kernels Qi and Q2 the V-norm distance 
between Qi and Q2 is defined by 

inn n ill i \ ni Ml I \\Qi(x, ■) ~ Q 2 {x, -)\\v 
\\\Qi- Qalllv := HQi(av) - Ollv v = SU P 



y(ar) 



For a probability distribution /x, define a transition kernel p(x,-) := //(•), 
to allow for writing |||Q — and \\\pi — p^Wlv- E)efine also the following 
Banach space 

By :={/:/: * - fl, < 00}. 

Now if |||<5i — Q2IHV < °°; then Qi — Q 2 is a bounded operator from .By to 
itself, and \\\Qi — Q2WW is its operator norm. See Meyn &: Tweedie 1 993| 
Chapter 16 for details. 

Now we are in a position to introduce the V— uniform ergodicity. 

Definition 2.1.10 (V— uniform ergodicity). We say that a Markov chain 
(X n ) n > with transition kernel P and stationary distribution 7r is V— uniformly 
ergodic, if 

\\\ pn - AWv -> 0, as n^oo. (2.5) 
Moreover, since ||| • |||y is an operator norm (12.51) is equivalent to 

\\\P n -7r\\\ v < Mp n , for some M < 00 and p < 1. (2.6) 



2.2 Small Sets and the Split Chain 

The regeneration construction has been invented independently by [Nummelin 1978] 
and [Athreya fc Ney 1978| and is now a very celebrated technique. The de- 
velopment of this approach resulted in intuitive and rather simple proofs 
of most results about Markov chains and enabled better understanding and 
rapid progress of the theory. In this section we provide the basics of the 
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regeneration and split chain construction needed for the following chapters. 
Systematic development of the theory can be found in [Nummelin 1984) and 
|Meyn fc Tweedie 1993| which we exploit here. 

We begin with the following definition of an atom. 

Definition 2.2.1 (Atom). A set B G B(X) is called an atom for a Markov 
chain (X) n >o with transition kernel P if there exists a probability measure 
v on B(X), such that for all x G B, 

P{x,-) = u{>). 

If the Markov chain is ^—irreducible and if)(B) > then B is called an 
accessible atom. 

A single point x G X is always an atom. For a discrete state space 
irreducible Markov chain every single point is an accessible atom. Much of 
the discrete state space theory is developed by studying Markov chain tours 
between consecutive visits to a distinguished atom c G X. On a general state 
space accessible atoms typically do not exist. However such atoms can be 
artificially constructed. First we provide a general version of a minorization 
condition that enables this construction. 

Definition 2.2.2 (Minorization Condition - a general version). Let s : X — > 
[0, 1] be a function for which E w s > and there exists an m > and such a 
probability measure v m on B(X), that for all x G X, 

P m (x,-)>s(x)v m (-)- (2-7) 

However, a special case of this condition with s(x) = elc(x) usually turns 
out to be as powerful as the general version and is often more suitable to work 
with. 

Definition 2.2.3 (Small Set). A set C G B(X) is u m — small, if there exist 
m > 0, £ > 0, and a probability measure v m on B(X), such that for all x G C, 

P m (x,-)>eu m (-). (2.8) 



Remark 2.2.4. Theorem 5.2.2 of |Meyn fc Tweedie 1993| states that any ^—irreducible 



Markov chain is well-endowed with small sets C of positive measure ip and 
such that u m (C) > 0. Since ergodic Markov chains are 7r— irreducible, for an 
ergodic chain a small set C with ir(C) > and v m (C) > always exists. 
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Definition 12.2.31 and Remark 12.2.41 imply the following minorization con- 
dition. 

Definition 2.2.5 (Minorization Condition). For some e > 0, some C such 
that ip(C) > 0, and some probability measure v m with u m (C) = 1 we have 
for all x G C, 

P m (x,-)>eu m (-). (2.9) 

The minorization condition (I2.9P allows for constructing the split chain 
for (X n ) n > which is the central object of the approach (see Section 17.3 
of |Meyn &: Tweedie 1993 for a detailed description). Let {X nm ) n > be the 
m— skeleton of (X n ) n > , i.e. a Markov chain evolving according to the m— step 
transition kernel P m . The minorization condition allows to write P m as a 
mixture of two distributions: 

P m (x, •) = el c {x)v m {-) + [1 - £l c (x)]R(x, •), (2.10) 

where R(x, ■) = [1 - el c (x)]~ 1 [P(x, ■) - el (x)u m (-)]. Now let (X nm ,Y n ) n > 
be the split chain of the m— skeleton i.e. let the random variable Y n G {0, 1} 
be the level of the split m— skeleton at time nm. The split chain (X nm , Y n ) n > 
is a Markov chain that obeys the following transition rule P. 

P(Y n = l,X( n+ i) m G dy\Y n -i,X nm = x) = el c {x)v m (dy) (2.11) 
P(Y n = 0,X {n+1)m edy\Y n _ u X nm = x) = (l-eI c (x))R(x,dy), (2.12) 

and Y n can be interpreted as a coin toss indicating whether X( n+ i) m given 
X nm = x should be drawn from u m (-) - with probability elc(x) - or from 
R(x, •) - with probability 1 — elc(x). 

Obviously (X nm , Y n ) n > , i.e. the split chain of the m— skeleton is a Markov 
chain and the crucial observation follows from the Bayes rule, namely the set 
a:=Cx{l}isan accessible atom for this chain. 

One obtains the split chain (Xk, Y n )k>o, n >o of the initial Markov chain 
(X n )„> by defining appropriate conditional probabilities. To this end let 
X™ m = {X , . . . , X nm _x} and Yq = [Y , . . . , Y n _i}. 

PO^n — l,Xnm+i G dx\, . . . , X( n+ i) m _i G dx m -\, X( n+ i) m G dy\ (2-13) 

\Y n ,Xr;X nm = x) = ^p^yf P^dx,) ■ --Pix^dy), 
P(Y n = 0, X nm+1 G dxi, X (n+ i )m _i G dxm-i, X( n+ i) m G dy\ (2.14) 
l^o ) i X-nm — x) — P m [x dy) P^ x i dx\) ■ ■ ■ P{x m —\, dy), 
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where pw^Jy) an d p^xdy) are Radon-Nykodym derivatives. Note that the 
marginal distribution of (Xk)k>o in the split chain is that of the underlying 
Markov chain with transition kernel P. 

An important characterization of the invariant measure obtained via the 
splitting technique is a generalization of the Kac's Theorem, namely Theorem 



12.2.81 which is the key conclusion of Chapter 10 in Meyn &: Tweedie 1 993 
Let 

OO / OO 

U(x, A) := pn (^ A) = EJY1 M*n) 

n=l ^ n=l 

and for a measure if) define 

B + {X) := {A G B{X) : iff (A) > 0}. 

Definition 2.2.6 (Recurrent Chains). A chain (X„) n > with a transition 
kernel P is called recurrent if it is ip— irreducible and U(x, A) = oo for any 
x G X and every A G B + (X). 

Remark 2.2.7. Recurrence is a weaker condition then Harris recurrence, in 
particular the Markov chain defined in Example 12.1.61 is recurrent but not 
Harris recurrent. 

Moreover, for a set A G X define its hitting time t a as 

ta '■= min{ra > 1 : X n G A}. 

Theorem 2.2.8. Let the Markov chain (X n ) n ^ be recurrent. Then there ex- 
ists an unique (up to constant multiples) invariant measure ir u . This measure 
ix u has the following representation for any A G B + (X) 



7T U {B) = I E x 
J A 



n=l 



7i u (dx), B G B{X). (2.15) 



Moreover, the measure tt u is finite if there exists a small set C such that 

sup E x [t c \ < oo. 

To take advantage of the splitting technique for analyzing Markov chains 
and functionals of Markov chains we need a bit more formalism. For a 
measure A on (X, B(X)) let A* denote the measure on X x {0, 1} (with 
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product a-algebra) denned by \*(B x {1}) = eX(BnC) and \*(B x {0}) = 
(1 - e)X(B n C) + \{B n C c ). In the sequel we shall use z4 for which v* m {B x 
{1}) = ev m (B) and ^(Bx{0}) = (1— e)v m (5) due to the fact that ^ m (C) = 
1. 

Now integrate 02.131) over xi, . . . , x m _i and then over y. This yields 

P(Y n = l,X (n+1)m e dy\Y n ,X™;X nm = x) = el c {x)u m {dy), (2.16) 

and 

P(Y n = l\Y n ,X™;X nm = x) = el c (x). (2.17) 
From the Bayes rule we obtain 

P{X (n+l)m G dy\Y n ,X™;Y n = l,X nm = x) = v m {dy), (2.18) 

and the crucial observation due to Meyn and Tweedie, emphasized here as 
Lemma 12.2.91 follows. 

Lemma 2.2.9. Conditional on {Y n = 1}, the pre—nm process {X kl Yi : k ^ 
nm,i ^ n} and the post—{n + l)m process {X k ,Yi : k ^ (n + l)m,i ^ 
n + 1} are independent. Moreover, the post—(n + l)m process has the same 
distribution as {X k , Yi : k ^ 0,i ^ 0} with for the initial distribution of 
(X ,Y ). 

Next, let a & {n) denote entrance times of the split chain to the set a = 
C x {1}, i.e. 

c"d(0) = minjfc ^ : Y)~ — 1}, o-a{n) = min{/c > a(n — 1) : Y)~ — 1}, ^ 1, 
whereas hitting times r & {n) are defined as follows: 

Tq(1) = min{&; ^ 1 : Yk = 1}, TQ,(n) = min{A; > Ta(n— 1) : = 1}, n ^ 2. 
In view of Lemma [2.2.91 it should be intuitively clear that the following tours 

{{X( a& (n)+l)m, -X"(o-&(n)+l)m+l , • • • > ^ {a & (n+l)+l)m-l} , U = 0, 1, . . . } 

that start whenever X k ~ v m are of crucial importance. In fact in the next 
chapter they will turn out to be much more tractable then the crude chain 
(X n ) n ^ on X . 
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Since we are interested in functional of the Markov chain (X n ) n ^ , for a 
real-valued function, say g, on X, we define here also 

m(<r a (i+l)+l)-l er a (i+l) 

Si = Si (g)= 9{Xj)= Z ^), (2.19) 

j=m(oa{i) + l) j=aa(i) + l 

where 

m— 1 

^•0?) = ( 2 - 2 °) 

fc=0 

Remark 2.2.10. Clearly, one can construct the split chain based on the more 
general minorization condition (12.71) instead of (12.91) . We chose (12.91) for 
simplicity. However, we use the split chain construction based on (|2.7p in 
Chapter @J 
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Chapter 3 

A Complete Characterisation of 
yjn— CLTs for Ergodic Markov 
Chains via Regeneration 



Central limit theorems for functionals of general state space Markov chains 
are of crucial importance in sensible implementation of Markov chain Monte 
Carlo algorithms as well as of vital theoretical interest. Different approaches 
to proving this type of results under diverse assumptions led to a large variety 
of CTL versions. However due to the recent development of the regeneration 
theory of Markov chains, many classical CLTs can be reproved using this 
intuitive probabilistic approach, avoiding technicalities of original proofs. In 
this paper we provide an if and only if characterization of i/n— CLTs for 
ergodic Markov chains via regeneration and then use the result to solve the 
open problem posed in [Roberts fc Rosenthal 2005] . We then discuss the 
difference between one-step and multiple-step small set condition. 

Results of this chapter are based on paper [ Bednorz, Latala fc Latuszyhski 2 008| 
and are joint work with Witold Bednorz and Rafal Latala. 



3.1 CLTs for Markov Chains 

Let (X n ) n ^ be a time homogeneous, ergodic Markov chain on a measurable 
space (X,B(X)), with transition kernel P and a unique stationary measure 
Ti on X. We remark that here ergodicity means that 

lim ||P n (x, •) - 7r|| to = 0, for all x E X, (3.1) 



28 



where || • \\ tv denotes the total variation distance. The process {X n ) n ^ may 
start from any initial distribution 7r . Let g be a real valued Borel function 
on X, square integrable against the stationary measure it. We denote by g its 
centered version, namely g = g — J gdn and for simplicity S n := Yl^o di^i)- 
We say that a y/n— CLT holds for (X n ) n ^ and g if 



as n 



oo, 



(3.2) 



where a 2 g < oo. 

Central limit theorems as defined by condition (13 .21) are crucial for as- 
sessing the quality of Markov chain Monte Carlo estimation as we demon- 
strate in Chapter [4] (c.f. [Jones et al. 2006] and |Geyer 1992| ) and are also 
of independent theoretical interest. Thus a large body of work on CLTs for 
functionals of Markov chains exists and a variety of results have been estab- 
lished under different assumptions and with different approaches to proofs 
(see [Jones 2 005] for a review). 

First we aim to provide a general result, namely Theorem l3.3.1l that gives 
a necessary and sufficient condition for a/^-CLTs for ergodic chains (which is 
a generalization of the well known Theorem 17.3.6 |Meyn fc Tweedie 1993| ). 
Assume for a moment that there exists an accessible atom a G B(X), i.e. 
such a set a that n(a) > and there exists a probability measure v on B(X), 
such that P(x, A) = v(A) for all x G a. Let r a be the first hitting time for 
a. In this simplistic case we can rephrase our Theorem 13.3.11 as follows: 

Theorem 3.1.1. Suppose that (X n ) n ^ is ergodic and possess an accessible 
atom a, then the ^/n—CLT holds if and only if 



k=l 



< OO. 



(3.3) 



Furthermore we have the following formula for the variance 



o n = ir(a)E 



k=l 



We discuss briefly the relation between two classical CLT formulations for 
geometrically ergodic and uniformly ergodic Markov chains (recall Definition 
12.1.91) . Recently the following CLT provided by [ Ibragimov fc Linnik 1971| 
has been reproved in [Roberts fc Rosenthal 2005] using the intuitive regen- 
eration approach and avoiding technicalities of the original proof (however 
see Section I3T51 for a commentary). 
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Theorem 3.1.2. // a Markov chain (X n ) n ^ with stationary distribution n 
is geometrically ergodic, then a ■s/ri—CLT holds for (X n ) n ^ and g whenever 
7r(\g\ 2+s ) < oo for some 5 > 0. Moreover a 2 := j x g 2 dn+2 j x J2'^ =1 g{X )g(X n )d'K. 

Remark 3.1.3. Note that for reversible chains the condition n(\g\ 2+s ) < oo 
for some 5 > in Theorem 13.1.21 can be weakened to 7r(g 2 ) < oo as proved 
in [Roberts &: Rosenthal 1997b] . however this is not possible for the general 
case, see |Bradley 198 3] or [ Haggstrom 2005| for counterexamples. 

Roberts and Rosenthal posed an open problem, whether the following 
CLT version for uniformly ergodic Markov chains due to |Cogburn~T9 72| can 
also be reproved using direct regeneration arguments. 

Theorem 3.1.4. // a Markov chain {X n ) n ^ with stationary distribution ti 
is uniformly ergodic, then a ^fn—CLT holds for (X n ) n ^ and g whenever 
n(g 2 ) < oo. Moreover a 2 := J x g 2 dit + 2j x Y^=i g(X )g(X n )d7i . 

The aim of this chapter is to prove Theorem 13.3.11 and show how to 
derive from this general framework the regeneration proof of Theorem 13.1.41 
The outline of the chapter is as follows. In Section 13.21 we provide some 
preliminary results which may also be of independent interest. In Section f3T3l 
we detail the proof of Theorem l3.3.11 and derive Theorem 13. 1.41 as a corollary 
in Section 13.41 Section 13.51 comprises a discussion of some difficulties of the 
regeneration approach. 

3.2 Tools and Preliminary Results 

Recall the split chain construction of the previous chapter and the notation 
therein. In particular Sj, defined by (12.191) will be of our vital interest. 

In this section we take g, the centered version of g, and analyze the se- 
quence Si(g), i ^ 0. The basic result we often refer to is Theorem 17.3.1 in 
|Meyn fc Tweedie 1993 ], which states that (sj)i^o is a sequence of 1-dependent, 
identically distributed r.v.'s with Esi = 0. In our approach we use the fol- 
lowing decomposition: Sj = s t + Sj, where 



<T&(i+l)-l 



r <j a (i+l)-l 







(3.4) 



j = Va(i) + l 



z *&{i+i){g) - E n * Z aA{i+ i)(g) . 



(3.5) 
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A look into the proof of Lemma T3.2.3I later in this section clarifies that and 
Si are well defined. 

Lemma 3.2.1. The sequence (sj^o consists of i.i.d. random variables. 

Proof. First note that is a functio n of {X (fTa ( i)+1 ) m , X^ i)+1)m+ i, . . . } and 
that = 1, hence by Lemma [2.2.91 s . s 1 . s 2 . . . . are identically dis- 

tributed. Now focus on Sj,s i+fc and F^^+fc) for some k ^ 1. Obviously 
^ a (i+fc) — 1- Moreover is a function of the pre— o~ & (i + k)m process and 
s i+k is a function of the post— {a & {i + k) + l)m process. Thus and s i+k are 
independent again by Lemma [2.2.91 and for Ai,A i+k , Borel subsets of R, we 
have 

e ^} n g A, +fc }) = ^({^ g g A J+fc }). 

Let ^ ii < i 2 < • • - < %i- By the same pre- and post- process reasoning we 
obtain for A ix , . . . , A it Borel subsets of R that 

P**{{s il eA il }n---n{s il eA il }) = 

= 4 S (K g A h } n • • ■ n g VJ) • P K ({s H g 

and the proof is complete by induction. □ 



Now we turn to prove the following lemma, which generalizes the conclu- 
sions drawn in [Hobert &: Robert 2004] for uniformly ergodic Markov chains. 



Lemma 3.2.2. Let the Markov chain (X n ) n ^ be recurrent (and (X nm ) n ^ 
be recurrent) and let the minorization condition \2. 9\) hold with 7r(C) > 0. 
Then 



C(X m(1) \{X ,Y } ea)= jr,(X a&(0) \{X ,Y } ~ v*J = n c (-), (3.6) 

where 7Tc(-) is a probability measure proportional to it truncated to C, that is 
n c (B)=ir(C)- 1 n{Br\C). 

Proof. The first equation in ( 13.61) is a straightforward consequence of the split 
chain construction. To prove the second one we use Theorem 12.2.81 for the 
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split m— skeleton with A = a. Thus Ta = t & (1) and tt := 7r* is the invariant 
measure for the split m— skeleton. Let C D B G B(X), and compute 



£7T(S) = 7t(B X {1}) = / E Xj y 

J a 



~R-Bx{i}(x nm , Y n ) 

n=l 



n=0 



7r(a)^I B (X CTa 



This implies proportionality and the proof is complete. 



□ 



Lemma 3.2.3. E n *sj < ™^Jp\ < oo and (sj)i^o are l-dependent identically 
distributed v. v. 's. 



Proof. Recall that S; = J2T=o 9( X a & (i+i)m+k) ~ (XT=o 9{ X a & (i+i)m+k)) 
and is a function of the random variable 



{ X aa(i+l)iri: ■ ■ ■ ) X cra (i+l)m+m 



-il- 



ls.?) 



By Hi(-) denote the distribution of (13.71) on X m . We will show that /ij does 
not depend on i. From (12.1 3ft . (12.171) and the Bayes rule, for x G C, we obtain 



-P^nm+i £ cfoi, . . . ,X( n+ i) m _i G c?x m _i, X( n+ i) m G dy 



(3.8) 



5^-0 i 1 n — L j ^nm 



Lemma [3.2.21 together with (13.81) yields 

-P^nm £ X nm+ i G dXi, . . . , X(n+l)m-l G ^m-l)-^(n+l)m £ ^2/ (3-9) 

F n ,X rim ; y; = l;a (i (0)<n) = 7Tc {dx)^^-P(x,dx 1 )---P(x m ^ 1 ,dy). 

Note that p^^dy) 1S J us ^ a Radon-Nykodym derivative and thus (13.91) is a 
well defined measure on X m+1 , say /i(-). It remains to notice, that fii(A) = 
fi(A x X) for any Borel A C X m . Thus /Xj, i ^ are identical and hence Sj, 
z ^ have the same distribution. Due to Lemma [2.2.91 we obtain that Sj, 



32 



i ^ are 1-dependent. To prove E^sf < oo, we first note that -p 
and also ir c ( 



v m {dy) 
l (x,dy) 



<l/e 



tt(C) 



7T 



Hence 



m(A) = fi(A xZ)<: 



STl(C) 



A'-chain 



where /i C ham is defined by ir(dx)P(x, dx\) . . . P(x m _ 2 , c/x m _i / 



Thus 



, ra— 1 



m+A; ; 



,fc=0 



< 



£7T(C) 



< OO. 



Now let = Y!k=Q g(Xa & (i+i)m+k) and proceed 

1 1 



' m— 1 



~2 



£7r(C) 



m—l 



k=0 



2 — 2 

m irg 

£7T(C) ' 



,fc=0 



□ 



We need a result which gives the connection between stochastic bound- 
edness and the existence of the second moment of s { . We state it in a general 
form. 

Theorem 3.2.4. Let (X n ) n ^ be a sequence of independent identically dis- 
tributed random variables and S n = X]fc=o ^k- Suppose that (r n ) is a sequence 
of positive, integer valued r.v.'s such that r n /n — > a G (0, oo) in probability 
when n — > oo and the sequence {n~ 1 ' 2 S T ^) is stochastically bounded. Then 



EXZ < oo and EXo 



0. 



The proof of Theorem 13.2.41 is based on the following lemmas. 

Lemma 3.2.5. Let 5 G (0,1) and t := sup{t > 0: sup ^ fe ^ n P{\S k \ ^ t) ^ 
<5}. ThenP(\S Wn \ > 4t ) ^ (l-5)(5/4) 20 andP(sup fc<n |£ fc f< 3t ) ^ 1-35. 

Proof. By the definition of t there exists ^ n ^ n such that P(\S no \ ^ 
to) ^ 5. Then either P(\S n \ ^ t /2) ^ 5/2 or P(\S n \ ^ t /2) < <V 2 and 
consequently 

P(\S n - no \ > t /2) = P(|S n - S no \ > t /2) 

^ P(\S no \^t )-P(\S n \>t /2)>5/2. 



33 



Thus there exists n/2 ^ n\ ^ n such that P(\S ni \ > t /2) ^ 5/2. Let 
lOn = ani + b with ^ 6 < rii, then 10 ^ a ^ 20, 

P(|S ani | ^ 5t ) ^ P(5 ani ^ oto/2) + P(5 ani ^ -ato/2) 

^ (P(5 ni ^ t /2)) a + (P(5 ni ^ -t /2)T > (5/A)\ 

hence 

^ (5/A) a (l-5)>(l-5)(5/A) 20 . 
Finally by the Levy-Octaviani inequality we obtain 

f[ sup |Sfc|>3t < 3supP(|S fe | >t„) ^ 35. 

^ fc^n ' k^n 

□ 

Lemma 3.2.6. Let c 2 < Var(Xi), t/ien for sufficiently large n, P(\S n \ ^ 
c^/4) 1/16. 

Proof. Let (JQ') be an independent copy of (Xi) and 5^ = Y17=i -^l- Moreover 
let (£j) be a sequence of independent symmetric ±1 r.v.'s, independent of (Xi) 
and For any reals (aj) we get by the Paley-Zygmund inequality, 

P ( 1X^*1 > ^(5Z a 7 ) = P(|X a ^ ^£|^a^ J 

^ i=l i ' ^ i=l i=l ' 

a y (g5^gtf > 3 
v 4; piEr=i«^i 4 "i6 



Hence 



1=1 

q n 1 



i=l 

for sufficiently large n by the Weak LLN. Thus 
1 

8 



^p{\S n -S> n \> c -^i) <: P(\S n \>-^)+p(\S' n \>-^) 

^ 2P(\S n \>-^. 



□ 
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Corollary 3.2.7. Let c 2 < Var(-Xi), then for sufficiently large n, 



P( inf \S k \^ -cVn) ^ 2 

lOn^fe^lln 1 1 4 



-121 



3*o 

-121 



Proof. Let t be as in Lemma 13.2.51 for 5 = 1/16, then 
P inf \S k \ >t ) ^ P MSionl ^4t , 

\10n<fe<lln / V lOn^fc^lln 

= P(|5 10 „| > 4t )pf sup |£ fc | < 3t ) > 2- 

Hence by Lemma [3.2.51 we obtain i ^ Cy/n/4 for large n. □ 
Proof of Theorem \3. 2.j\ By Corollary 13.2.71 for any c 2 < Var(X) we have, 

P\\S T I #5 — ^/an\ P|| — — a|< — , inf |*St.| ^ — \/(m | > 



> PI . inf I^I^ta/— )-Pf|--«|>- 
1 1 4 V 21 7 V 1 n 1 21 



^ 2 



-121 



P 



7V, 



11 



i a , 
a > — > 2 
1 21 



-122 



for sufficiently large n. Since (n 1 ' 2 S Tn ) is stochastically bounded, we imme- 
diately obtain Var(Xi) < oo. If EX 1 then 

I ~~^S Tn I = I — ^- 1 1 — I \fn — > oo in probability when n — > oo. 

'n 1 1 r n 11 n 1 

□ 



3.3 A Characterization of i/n-CLTs 



In this section we provide a generalization of Theorem 17.3.6 of |Meyn &: Tweedie 1993 
We obtain an if and only if condition for the ^/n-CLT in terms of finiteness 
of the second moment of a centered excursion from a. 

Theorem 3.3.1. Suppose that (X n ) n ^ is ergodic and vr(g 2 ) < oo. Let v m 
be the measure satisfying \2. S\) . then the y/n—CLT holds if and only if 



ovs(0) 



n=0 



< OO. 



(3.10) 
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Furthermore we have the following formula for variance 
2 _ e7r(C) 



a 



m 



<T„(0) 

E z ^) 

n=0 



2 n r / CT «(0) 



n=0 ' ^n=cr (i (0)+l 



Proof. For n ^ define 

/„ := max{&; ^ 1 : m(o~a{k) + 1) ^ n} 

and for completeness Z n := if m(cXa(0) + 1) ^ n. First we are going to show 
that 

lu-l 



1 n— 1 , (n-1 



in probability. 



(3.11) 



j=0 v j=0 

Thus we have to verify that the initial and final terms of the sum do not mat- 
ter. First observe that by the Harris recurrence property of the chain a^(0) < 

00, P^.-a.s. and hence lim n _» oo P 7r *(m0'a(O) ^ n) = and P 7r *(oa(0) < oo) = 

1. This yields 



.. n— 1 _ ji— 1 

3=0 V 3=m(cr a (0)+l) 



0, P-a.s. 



(3.12) 



The second point is to provide a similar argument for the tail terms and to 
show that 



n-l mcr (5 (i n )+m-l 

E 9(x 3 )-— e 



0, in probability. 

(3.13) 



j=m(a & (0)+l) v j=m(<T & (0)+l) 

For e > we have 

n— 1 \ / 1 fa(Zn+l) 

^ E < ^o(^ E U\9\)>e 



j =771(0-4 (Z„)+l) 



/ 1 

fc=0 ^ * 7=1 



Now since Y^T=o P&( T &(1) ^ ^) ^ -Sa r <s(l) < 00, where we use that a is an 
atom for the split chain, we deduce form the Lebesgue majorized convergence 
theorem that (13131) holds. Obviously (13121 ) and ( I3T3D yield (I3TTT) . 



36 



We turn to prove that the condition (13.101) is sufficient for the CLT to 
hold. We will show that random numbers l n can be replaced by their 
non-random equivalents. Namely we apply the LLN (Theorem 17.3.2 in 
|Meyn fc Tweedie 1993| )) to ensure that 

[n/m]-l 

In 1 iviot) 

hm — = hm - > I{ {Xmk .Y k )e&} = , P** - a.s. (3.14) 

k=l 

Let 

n* := L7r(a)^^ _1 J, n := |~(1— £)7f (ct^nm -1 ] , n :— [(l+e)7r(a)nm~ 1 \ . 

Due to the LLN we know that for any e > 0, there exists n such that for all 
n ^ n we have -FV* (z& ^ ^ ^) ^ 1 _ £■ Consequently 

(in— 1 "* \ / i ™* \ 

I 5^ s i " 5Z s i| > v^/^j < e + P n *l n max | > /V™) + 

+ ^sf ™ax I ^ sJ >/V^)-(3.15) 

\ n— n*4-1 ' 



j=n*+l 



Since (sj)j^o are 1-dependent, := Ylj=o s j is not necessarily a martin- 
gale. Thus to apply the classical Kolmogorov inequality we define M° = 
J2JLo s 2jl{2j<fe} and = ^] °1 Si +2 jl{i + 2j<fc}, which are clearly square- 
integrable martingales (due to (13.101) ). Hence 

P-k* ( max \M n , - M l \ > fiy/n) ^ P K ( max \M° n , - M°\ > ^) + 

+Pj max \Ml-Ml\>^p) 

^ fc=0 

< Ce/r 2 £^(sjj), (3.16) 
where C is a universal constant. In the same way we show that 
P{ max _\M t - M n * +1 \ > /3y/n) ^ Cfe/T 2 ^^), 
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consequently, since e is arbitrary, we obtain 



Jn-l 



~A7 E S i ~ -fc E Sj ~* °' in P robabilit y- ( 3 - 17 ) 



3=0 v j=0 



The last step is to provide an argument for the CLT for 1-dependent, iden- 
tically distributed random variables. Namely, we have to prove that 



1 n 

~/=J2 s 3 ^N(0,a 2 ), as n^oo, (3.18) 

where 

Observe that fl3T2l) . fl3T3l) . (I3T7D and fl3T8l) imply Theorem [3331 We fix 
k ^ 2 and define := Sfc J+ i(#) + ... + Sfcj+fc_i(g), consequently £j are i.i.d. 
random variables and 

at!> = :a7 E ^ + + ^ E v ( 3 - 19 ) 



3=0 V j=0 V j=0 V j=k[n/k]+l 

Obviously the last term converges to in probability. Denoting 

a 2 := ^(0) 2 = (fc-l)^( So (^)) 2 + 2(A;-2)^( S ote) S i^)), 
a 2 := E^(s (g)) 2 . 

we use the classical CLT for i.i.d. random variables to see that 

[n/fcj-l |n/ fc J 



4= E O-A/-(0,A;-V fc 2 ), and * £ Sfei (£) ^(0, AT 1 

(3.20) 

Moreover 



lim 

n— >oo 



Kfcj-i |n/*J 

3=0 V 3=0 



(3.21) 



converges to7V(0, a 2 ), with — > oo. Since the weak convergence is metrizable 
we deduce from < KW\ . (13^01 and ( I3T2TA that (13181 ) holds. 
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The remaining part is to prove that (13.101) is also necessary for the CLT to 
hold. Note that if Ylk=o 9(Xk) / V™ verifies the CLT then Yl l j=o s j ls stochas- 
tically bounded by (I3.1ip . We use the decomposition Sj = s { + Sj, i ^ 
introduced in Section I3.2L By Lemma 13.2.31 we know that Sj is a sequence of 
1-dependent random variables with the same distribution and finite second 
moment. Thus from the first part of the proof we deduce that Y^j=Q^il 
verifies a CLT and thus is stochastically bounded. Consequently the remain- 
ing sequence Y^j=o Sj/y/n also must be stochastically bounded. Lemma 
13.2.11 states that (sj)j^o is a sequence of i.i.d. r.v.'s, hence E\s%] < oo by 
Theorem 13.2.41 Also l n /n — > 7!-(a)m'~ 1 by (|3.14p . Applying the inequality 
(a + b) 2 < 2 (a 2 + b 2 ) we obtain 

which completes the proof. □ 

Remark 3.3.2. Note that in the case of m = 1 we have Si = and for Theorem 
13.3. II to hold, it is enough to assume n\g\ < oo instead of Ti{g 2 ) < oo. In the 
case of m > 1, assuming only ir\g\ < oo and (13 . 1 01) implies the -y/n-CLT, but 
the proof of the converse statement fails, and in fact the converse statement 
does not hold (one can easily provide an appropriate counterexample). 

3.4 Uniform Ergodicity 

In view of Theorem 13.3.11 providing a regeneration proof of Theorem 13.1.41 
amounts to establishing conditions (13.101) and checking the formula for the 
asymptotic variance. To this end we need some additional facts about small 
sets for uniformly ergodic Markov chains. 

Theorem 3.4.1. If (X n ) n ^ Q , a Markov chain on (X,B(X)) with stationary 
distribution it is uniformly ergodic, then X is v m — small for some v m . 

Hence for uniformly ergodic chains f 1 2 . 9 f) holds for all £ G X. Theorem 
13.4.11 is well known in literature, in particular it results from Theorems 5.2.1 
and 5.2.4 in |Meyn fc Tweedie 1993| with their ip — ir. 

Theorem 13.4.11 implies that for uniformly ergodic Markov chains (12 . 10[ ) can 
be rewritten as 

P m (x, •) = ev m {-) + (1 - e)R(x, ■). (3.22) 
The following mixture representation of ir will turn out very useful. 
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Lemma 3.4.2. If (X n ) n ^ is an ergodic Markov chain with transition kernel 
P and 113.22) holds, then 

7T = £fl ■= e^u m {\ -e) n R n . (3.23) 

n=0 

Remark 3.4.3. This can be easily extended to the more general setting than 
this of uniformly ergodic chains, namely let P m (x, ■) = s(x)v m (-) + (1 — 
s(x))R(x, ■), s : X — > [0, 1], tts > 0. In this case n = ns J2^=o u rnR#, where 
R#(x, ■) = (1 — s(x))R(x, ■). Related decompositions under various assump- 
tions can be found e.g. in [Nummelin 2002| . [Hobert fc Robert 2004] and 
|Breyer fc Roberts 2001| and are closely related to perfect sampling algo- 
rithms, such as coupling form the past (CFTP) introduced in |Propp k. Wilson T9 96| 

Proof. First check that the measure in question is a probability measure. 

(OO \ oo 

e - e) n R n W) = £ ][>- e) n {v m R n ) (X) = 1. 
n=0 ' n=0 

It is also invariant for P m : 

(f2^m(l-e) n R n ]p m = ^p m {l-e) n R n \ev m + {l-e)R) 

\ n=0 ' ^ n=0 ' 

oo oo 



n=l n=0 



Hence by ergodicity £/i = efiP nm — > it, as n — > oo. This completes the 
proof. □ 



Corollary 3.4.4. The decomposition in Lemma \3.4.2\ implies that 

<t(0) oo 

(0 E »* m ( l {XnmeA}) = Ev* m ( l {XnmeA}l{Y =0,...,Y n -,=0}) = ^(A), 

n=0 n=0 
oo 

E u ^ ( 2J f(X nm , X nm+ i, . . . ; Y n , Y n+1 , . . . )I{y =o,...,Y n _i=o}) = 



n=0 



£ 1 E n *f(X , Xi, . . . ; Y , Yi, . . 
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Proof, (i) is a direct consequence of (13.231) . To see (ii) note that Y n is a coin 
toss independent of {Y , . . . , Y n _i} and X nm , this allows for n* instead of n on 
the RHS of (ii). Moreover the evolution of {X nm+1 , X nm+2 , Y n+1 , Y n+2 , . . . } 
depends only (and explicitly by (12.131) and (I2.14D ) on X nm and Y n . Now use 
(i). ' ' ' □ 



Our object of interest is 



<r(0) 

E Z ^) 

n=0 



s(0)^n} 



E Z n(g)\Yo=0,...,Y n - 1 -- 



0} 



n=0 



n=0 
+ 



+ 2E„ 



oo oo 



E E z mm 



(O)^fc} 



n=0 k=n+l 



A + B 



(3.24) 



Next we use Corollary 13.4.41 and then the inequality 2ab ^ a 2 + b 2 to bound 
the term A in (13T24D . 



m— 1 



A = e- 1 E n ,Z Q {g) 2 = e~ l E^[Y,9{X k 



k=0 



m—1 



m 2 %g 2 < oo. 



fc=0 



We proceed similarly with the term B 



\B\ < 2E„ 



2e~ v E^ 



(0)^n} E l^+fc(^)l^a(0)>n+fc} 
fe=l 

oo 

z o(g)\ E l^fc^l^aW^*} 



n=0 



fe=l 



By Cauchy-Schwarz, 



E n *[l {tT&( o )>k} \Z (g)\\Z k (g)\\ ^ [l {(T . (0 );>fc}Zo(#) 2 ] \J K*Z k {g) 



E n * [l { Y =o}l{Y 1 =o,...,Y k _ 1 =o}Z {g) 2 ]\/E n *Z (g) 
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Observe that {Yi, . . . , Y k _{\ and {X , . . . , X m _{\ are independent. We drop 
I{y =o} to obtain 

E n *[l {adi{0 ^ k} \Z (g)\\Zk(g)\] < (l-e^E^Zoig) 2 ^ (l-e^mW- 

Hence \B\ < oo, and the proof of (13.101) is complete. To get the variance 
formula note that the convergence we have established implies 



I = e~ x E^ 



Similarly we obtain 



Z (g) 



Zo(g)^2z k (g)i {a , 



5 (0)Ssfc} 



J := 2E U « 



CTa(O) 



k=l 



(E z »®)( E Z M 

n=cr&(0)+l 



n=0 



2£~ 1 K* 



^o(#) E Z k (g)I{a & (l)>k} 



Since 7r(C) = 1, we have a 2 g = em l (I + J). Next we use Lemma [2.2.91 and 
En*Z (g) = to drop indicators and since for / : X — > _R, also -E^-*/ = 
we have 



e(J+J) - 4 

1 ' fe=i 

oo 



E m 



Z (g)lz (g)+2j2Zk(g) 
^ fe=i 

y oo 

Z (g)[Z (g) + 2j2Zk(g) 



fc=i 



Now, since all the integrals are taken with respect to the stationary measure, 
we can for a moment assume that the chain runs in stationarity from — oo 
rather than starts at time with X Q ~ n. Thus 



al = mT x E. 



m 1 E 7 , 



-TO— 1 / OO \ 

E^)( E 

- 2=0 ^fc=-oo ' 



Z (g)l E Z k (g) 

^fc=-oo 

oo 

#(X ) E 9(X k )} = I fd7r + 2 I Y,9(X )g(X n )dn. 



k=—oo 



n=l 
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3.5 The difference between m = 1 and m^l 



Assume the small set condition (12.90 holds and consider the split chain de- 
fined by (127I3D and (EEIl) . The following tours 

(o-(n)+l)m, -^(<r(n)+l)m+l j • • • , -^0(n+l)+l)m-l }) 71 = 0, 1, ... } 

that start whenever ~ z/ m are of crucial importance to the regeneration 
theory and are eagerly analyzed by researchers. In virtually every paper 
on the subject there is a claim these objects are independent identically 
distributed random variables. This claim is usually considered obvious and 
no proof is provided. However this is not true if m > 1. In fact formulas (12.1 3f) 

and 02.141) should be convincing enough, as X mn+ i, . . . , X( n+ i) m given Y n = 1 
and X nm = x are linked in a way described by P(x, dx\) ■ ■ ■ P(x m _i, dy). 
In particular consider a Markov chain on X = {a, b, c, d, e] with transition 
probabilities 

P(a, b) = P(a, c) = P(b, b) = P(b, d) = P(c, c) = P(c, e) = 1/2, 
P(d,a) = P(e,a) = 1. 

Let ^(d) = z/ 4 (e) = 1/2 and e = 1/8. Clearly P 4 (x, •) ^ ev±(-) for every 
x G X, hence we established (12.91) with C — X. Note that for this simplistic 
example each tour can start with d or e. However if it starts with d or e the 
previous tour must have ended with b or c respectively. This makes them 
dependent. Similar examples with general state space X and C ^ X can be 
easily provided. Hence Theorem 13.3.11 is critical to providing regeneration 
proofs of CLTs and standard arguments that involve i.i.d. random variables 
are not valid. 
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Chapter 4 

Fixed- Width Asymptotics 



Determining the length of simulation for MCMC algorithms that guarantees 
good quality of estimation is a fundamental problem. One possible approach 
is to wait until width of an asymptotic confidence interval based on the 
approximation by a normal distribution becomes smaller then a user-specified 
value. This requires estimating a 1 the variance of the asymptotic normal 
distribution. In this chapter we relax assumptions required to obtain strongly 
consistent estimators of a 2 g in the regenerative setting. 

Results of this chapter (in particular the key Lemma 14.3.31 and result- 
ing from it Lemma 14.3.61 and Proposition I4.3.TH are based on the paper 
|Be dnorz fc Latuszyhski 2007| and are joint work with Witold Bednorz. 

The presentation of the fixed-width asymptotic approach is based on 
[Jones et al. 2006] . We provide only a quick sketch, since the approach 
is well known in literature (see also |Geyer 1992| , |Mykland et al. 19 95 1, 
[Hobert et al. 2002] ) and [Jones et al~2 006j is an excellent recent reference. 

4.1 Asymptotic Confidence Intervals 

Suppose that we are in the standard MCMC setting and our goal is to esti- 
mate / = E n g = f x g(x)7i(dx). Let (X n ) n ^ be a time homogeneous, aperi- 
odic and Harris recurrent Markov chain with transition kernel P and limiting 
invariant probability distribution 7r. 

Consider the estimator along one walk without burn-in, i.e. 

-. n—l 
i=0 
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of the unknown value /. By Theorem 12.1.51 /„ — > I, as n — > oo, with proba- 
bility 1. Moreover, assume for a moment that a \fn— CLT holds and let a 2 g 
be the asymptotic variance, as defined in (13 .21) . 

We will study the following sequential procedure. Let n* = n*(e) be the 
first time that 



q.^+p(n)<e, (4.2) 



n 



where a\ is an estimate of a 1 at time n, and g, is an appropriate quantile, 
p{n) > is a strictly positive decreasing function on Z + , and e > is the 
desired half-width. 

At time n* we build an interval I*(s) := [I n * — e,I n * + e] of width 2e. 
For independent samples such procedures are known to work well and be- 
long to classical results of sequential statistics (c.f. [Chow fc Robbins~T9 65]. 
[Nadas 1969| and |Liu, W 1997| ). However in our context we have to apply 
the following result form |Glynn fc Whitt 1 992 . 



Theorem 4.1.1 (Glynn & Whitt 1992). If 

(a) A functional central limit theorem holds, i.e. as n — > oo, the distribu- 
tion of 

\nt\ 

Y n (t) := 

converges to Brownian motion with variance o 2 weakly in the Skorohod 
space on any finite interval, 

(b) cr^ — > a 2 with probability 1 as n — > oo, 

(c) The sequence p(n) is strictly positive and decreasing andp(n) = o(n -1 / 2 ), 
then 

P{I 6 r(e)) -> 1 - 6, as e^O. (4.3) 

Markov chains often enjoy a functional central limit theorem under the 
same conditions that ensure the standard y/n— CLT. In particular the follow- 
ing results are well known: 

Theorem 4.1.2. Assume (X n ) n ^ is a Harris ergodic Markov chain. If one 
of the following conditions holds, then a functional central limit theorem also 
holds. 
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(a) (due to IDoukhan et al. 1991$ ) The chain is geometrically ergodic and 
E4g 2 (x){\og + \g(x)\)] < oo, 

(b) (due to ^ Roberts & Rosenthal 1997b^ ) The chain is geometrically er- 
godic, reversible, and E n g 2 (x) < oo, 

(c) (due to \Billingsley 19 68]) The chain is uniformly ergodic and E 7T g 2 (x) < 
oo. 

The goal of this chapter is to obtain additionally condition (b) of Theorem 
14. 1.11 for a suitable estimator a 2 , of a 2 , under possibly weak assumptions and 
consequently conclude (14.31) . In particular we will need stronger assumptions 
then those listed in Theorem 14.1.21 thus condition (a) of Theorem 14.1.11 will 
hold automatically. 



4.2 Estimating Asymptotic Variance 

We will discuss two methods of estimating the asymptotic variance described 
in [Jones et al. 2006] . based on batch means and regenerative simulation. 

4.2.1 Batch Means 

For the bath means estimator suppose that n — 1 iterations of the algorithm 
are performed and we partition the trajectory of length n into a n blocks of 
length b n i.e. 

n — a n b n 



Define Yi, . . . ,Y an as 

Y r .= - £ aft). 

i=(i-l)&n 

Then the bath means estimate of a 2 is 



°BM 



T E^-^) 2 - ( 4 - 4 ) 



In the next section we provide an appropriate strategy for choosing a r , 
and b n for a 2 BM to be a consistent estimator. 
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4.2.2 Regenerative Estimation 



Assume that the following minorization condition with m = 1, as introduced 
in Definition E22] holds. 



By straightforward modification of the split chain construction of Section 
12.21 we obtain a bivariate process (X n , Y n ) n > that evolves according to the 
following transition rule: 

• given X n = x, draw Y n ~ Bernoulli(s(a;)) 

• If Y n = 1, then draw X n+1 ~ z/(-), otherwise draw X n+1 ~ R(x, •). 

Moreover, the artificial atom a is now of the form a = X x {1}. Let us 
simplify the notation of Section 12.21 by setting r n = T & (n), for n = 1,2,... 
Suppose also that X ~ v and set r = — 1 to keep notation coherent with 
probabilistic behavior of the chain. Define also N{ = T i+ i—Ti, for i — 0, 1, . . . , 
and recall defined by (|2. 191) . Since m — 1, 



and observe that the (iVj, Si) pairs are iid random variables. 

For regenerative estimation of the asymptotic variance we will need (^)i>o, 
thus we must simulate the split chain (Xj, Yj)j>o, not only the initial chain 
(Xi)i> . However the simulation from R(x, •) in real life examples is often 
challenging. The following solution to this problem is provided in |Mykland et al. 1995| 

Suppose that P(x, •) has a density k(-\x) and z^(-) has a density v (•) with 
respect to a reference measure dx. Given Xi = x draw X i+ i ~ k(-\x) and 
draw Yi from the distribution of X i+ i, that is 



P(x, •) > s(x)v(-), for all x E X, 



(4.5) 



and define the residual transition kernel R(x, dy) as 




(1 - s(x))" 1 (P(x, dy) - s(x)u(dy)) if s(x) < 1, 
if s(x) = 1. 
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The method is feasible in many settings of practical interest (cf. |Mykland et al. 1995| 
[Jones et al. 2006j V 

Once we are able to simulate the split chain (Xj,Fj)j> , we can observe 
To, Tx, . . . and compute the following regenerative estimator of /. 



TR + 



j=0 



(4.6) 



where the fixed number R is the total number of regenerations observed. 
Note that I TR is a sum of fixed number of iid. random variables. Thus if 

E u Nq < oo and E v Sq < oo then 



v^(4-/)->iv(o,e 



as R —> oo, 



(4.7) 



where 



r2 _ E v (s - N I) 2 

(E v N y ■ 



Let N = R 1 (t r + 1) = R 1 J^f^ iVj. As an approximation for £;? one can 



A=0 

take the following regenerative estimator 

^ R-l 



i=0 



(4.8) 



Now observe that 



R-l 



E v (s - N I) 2 E u (s - N iy< 



i=0 
R-l 



N 2 



(E V N ) 



E [(* ~ ^if ± (si ~ NJ) 2 - E u (s - N I) 2 

i=0 



+ 



+E u (s - N lf 



[N 2 (E U N ) 2 \ ■ 



As noticed in [Jones et al. 2006] . repeated application of the strong law of 
large numbers (with R — > oo) yields that is a strongly consistent estima- 
tor of so it is enough to establish conditions E u Nq < oo and E v s\ < oo for 
the fixed width methodology to work. This is deferred to the next section. 
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Clearly, in this modified regenerative setting an asymptotically valid fixed- 
width result is obtained by terminating the simulation the first time that 

q .^L + p(R)<e. (4.9) 
V H 



4.3 A Lemma and its Consequences 

For geometrically ergodic Markov chains hitting times for sets of positive 
stationary measure have geometrically decreasing tails. In particular the 
following lemma is shown in [ Hobert et al. 20 02J . 

Lemma 4.3.1 (Lemma 2 of [H obert et al. 2002] ). Let {X n ) n ^ be a Har- 
ris ergodic chain and assume that ( |^.5| ) holds. If {X n ) n ^ is geometrically 
ergodic, then there exists a (3 > 1, such that E n (3 T1 < oo. 

Which immediately yields the following corollary. 



Corollary 4.3.2. Under the conditions of Lemma 4-3.1, for any a > 



Proof. 



oo oo 

( P ^i > * + !)) a < (K/3 ri ) a $>- a(m) < oo. (4.10) 

i=0 i=0 

oo oo 

X;(p,r(7i>*+i)) a < ^(K(i {T1 > m} /3 Ti r (m) )) a 

i=0 

oo 

= ^/3- a(4+1) (K(I{n>, + l}/5 
i=0 

oo 

< ^/r a(m) (^ 



i=0 i=0 

oo 



a 



i=0 



□ 



Observe also that we can integrate (14.51) with respect to n and obtain 
7r(-) > cu(-), where c = E n s. Thus for any function h : X°° — > R, 

E w \h(X , X u ...)\> cE u \h(X , X u ...)\. (4.11) 

Now we are in a position to prove our key result, namely the following 
lemma. 
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Lemma 4.3.3. Let (X n ) n ^ be a Harris ergodic Markov chain, assume the 
minorization condition (f^.<5| ) holds, and (X n ) n ^ is geometrically ergodic. Let 
g : X — > R be a real valued Borel function. Then, if 



Ett\ g\ p+s < oo for some p>0 and 5 > 0, 

then 

E u Nq < oo and E u \so\ p < oo. 
Remark 4.3.4. Lemma 14.3.31 improves the two following results: 

• Theorem 2 of [Hobert et al. 2002] that provides the implication 

En\g\ 2JrS < oo => E u Nq < oo and E u \s \ 2 < oo. 



Lemma 1 of [Jones et al. 2006] that for p > 1 provides implications 



E w \g\ 2P 1)+S < oo E U N$ < oo and E u \s \ p < oo. 

and 

E^\g\ 2P+s < oo E^iVj < oo and E v \s \ p+S < oo. 

Remark 4.3.5. Without additional restrictions E^g^ < oo does not imply 
E u \sq\ p < oo, so Lemma l4.3.3l can not be improved. To see this note that The- 
orem GT3J] of Chapter [3] combined with the presumption that in the setting of 
Lemma 14.3.31 E n \ g\ p < oo implies E u \so\ p < oo yields the Central Limit The- 
orem for normalized sums of g{Xi) for geometrically ergodic Markov chains 
assuming only E n g 2 < oo. This however is not enough for the CLT, Bradley in 
|Bra dley 1983| and also Haggstrom in [Haggstrom 2005| provide counterex- 
amples. Hence to obtain the implication E n \g\ p < oo =>• E u \s \ p < oo, one 
needs stronger assumptions, e.g. if p = 2 then uniform ergodicity is enough, 
as proved in Chapter [3j 



Proof of Lemma \4.3.3[ First note that by (14.111) it is enough to show that 



E^Nq < oo and E n \so\ p < oo. 

Moreover, since max*,. < oo for every p > and /3 > 1, by Lemma 

14.3.11 we obtain immediately E n N^ < oo. Thus we proceed to show that 
En\so\ p < oo. To this end first note that 

C := ((KKXOr 5 )^)^ < oo. (4.12) 
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Forp > 1 we use first the triangle inequality in L p , then Holder inequality, 
then (I4T21) and finally Corollary 3321 



p\Vp 



< 



p-i i/p 



i/p 



< 



i/p 



\i=0 

(oo 
Y,^<n)\g{X % 
i=0 

i=0 

oo 

oo 

= C ^2 ( p n(n > < OO. 



1/p 



(4.13) 



For < p < 1 we use the fact x p is concave and then proceed similarly as in 
([4~T3l to obtain 

/ oo 

E n \s \ p < Enl^Hi^riMXi 



J=Q 



< 



i=0 



i=0 



□ 



Lemma l4.3.3l allows us to restate results from section 3.2 of [Jones et al. 2 006J 
with relaxed assumptions. In particular in Lemma 2 and in Proposition 3 
therein it is enough to assume E w \g\ 2+s+£ < oo for some 5 > and some e > 0, 
instead of E n \g\ 4+S < oo for some 5 > 0. Modifications of the (rather long 
and complicated) proofs in [Jones et al. 2006] are straightforward. Hence we 
have 

Lemma 4.3.6 (Part b of Lemma 2 of [Jones et al. 2006] ). Let (X n ) n ^>o be 
a Harris ergodic Markov chain with invariant distribution ir. If (X n ) n ^ is 
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geometrically ergodic, j[4-5\) holds and E n \g\ 2+5+E < oo for some 5 > and 
some e > 0, then there exists a constant < a g < oo, and a sufficiently large 
probability space such that 



i=l 



0( 7 (n)) 



wif/i probability 1 as n ^ oo, where ^{n) = n a logn, a = 1/(2 + 5), and 
B = {B(t),t > 0} denotes a standard Brownian motion. 

Proposition 4.3.7 (Proposition 3 of [Jones et al. 20 06]). Let (X n ) n j> be a 
Harris ergodic Markov chain with invariant distribution ir. Further, suppose 
(X n )„^ ^ geometrically ergodic, ( |^.5| J holds and E n \g\ 2+s+£ < oo for some 
5 > and some e > 0. If 

1. a n — > oo, as n — > oo, 

£ 6 n — > oo anc? 6 n /n ^ as n — > oo, 

5. fr^n^flogn] 3 ^ as n — > oo, w/iere a = 1/(2 + 5), 

4- there exists a constant c > 1, such that Yl^=i(^n/ n ) c < oo, 



Then o\ 



BM 



a 2 w.p.l as n — > oo. 



Concluding Remark 4.3.8. Compare the foregoing result with Section l4.2.2l or 
with Proposition 1 of [Jones et al. 2006] to see that both methods described 
here, i.e. regenerative simulation (RS) and batch means (CBM), provide 
strongly consistent estimators of a 2 under the same assumption for the target 
function g. 
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Chapter 5 



Fixed- Width Nonasymptotic 
Results under Drift Condition 

In this Chapter we establish nonasymptotic fixed width estimation. We as- 
sume a drift condition towards a small set and bound the mean square error of 
estimators obtained by taking averages along a single trajectory of a Markov 
chain Monte Carlo algorithm. We use these bounds to determine the length 
of the trajectory and the burn-in time that ensures (e — a)— approximation, 
i.e. desired precision of estimation with given probability. Let / be the value 
of interest and / its MCMC estimate. Precisely, our lower bounds for the 
length of the trajectory and burn-in time ensure that 

P(\I-I\ <e) > I- a 

and depend only and explicitly on drift parameters, e and a. Next we intro- 
duce an MCMC estimator based on the median of multiple shorter runs. It 
turns out that this estimation scheme allows for sharper bounds for the total 
simulation cost required for the (e — a)— approximation. For both estimation 
schemes numerical examples are provided that include practically relevant 
Gibbs samplers for a hierarchical random effects model. 

5.1 Introduction 

Recall the estimation strategies introduced in Section 11.21 and described by 
(11.3H1.5I) . Estimation Along one Walk uses average along a single trajectory 
of the underlying Markov chain and discards the initial part to reduce bias. 
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The estimate of the unknown value I = j x f{x)n{dx) is of the form 



x 

1 t+n-l 



and t is called the burn-in time. 

The strategy is believed to be more efficient then estimation along one 
walk with spacing and multiple run described in Section [TT2~1 and is usually the 
practitioners choice. Some precise results are available for reversible Markov 
chains. Geyer in Geyer 1992| shows that using spacing as in (11.41) is ineffec- 



tive (in terms of asymptotic variance) and Chan and Yue in [Chan &: Yue 1996 



prove that (15. II) is asymptotically efficient in a class of linear estimators (in 
terms of mean square error). 

The goal of this chapter is to derive lower bounds for n and t in (15.11) . 
that minimize the total computation cost n + t, and that ensure the following 
condition of (e, a)— approximation: 

P(|4„-/| <£) > 1-a, (5.2) 

where e is the precision of estimation and 1 — a, the confidence level. Due to 
results in |Geyer 1992| and [Chan &: Yue 1996] no other linear modifications 
of the estimation scheme in (15.10 are analyzed. To decrease the total simu- 
lation cost for (15.21) we introduce instead a nonlinear estimator based on the 
median of multiple shorter runs. 

Results of this or related type have been obtained for discrete state space 
X and bounded target function / by Aldous in [Aldous 1987] . Gillman in 
[Gillman 1998| and recently by Leon and Perron in [Leon &: Perron 20 04J . 



Niemiro and Pokarowski in [Niemiro &: Pokarowski 2007| give results for rel- 
ative precision estimation. For uniformly ergodic chains on continuous state 
space X and bounded function /, Hoeffding type inequalities are available 
(due to Glynn and Ormonait in |Glynn &: Ormoneit 2002] , and an improved 
bound due to Meyn et al. in |Kontoyiannis at al. 2005| ) and can easily lead 
to the desired (e — a)— approximation. To our best knowledge there are no 
explicit bounds for n and t in more general settings, especially when / is not 
bounded and the chain is not uniformly ergodic. A remarkable presentation 
of the state of the art approach to dealing with this problem is provided by 
Jones at al. in the recent paper [Jones et al. 2006] . They suggest two proce- 
dures for constructing consistent estimators for the variance of the asymptotic 
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normal distribution for geometrically ergodic split chains and thus under the 
additional assumption of E 7r \f\ 2+S < oo for some 5 > (see Chapter [4] here 
for this weakened assumption and details of the procedure). 

Our approach is to assume a version of the well known drift condition to- 
wards a small set (Assumption 15 . 2 . ill and give explicit lower bounds on n and 
t in terms of drift parameters defined in Assumption 15 . 2 . 1 1 and approximation 
parameters defined in (15.2H . 

The rest of the Chapter is organized as follows. In Section 15.21 we intro- 
duce the drift condition assumption and preliminary results. In Section 15.31 
we obtain an explicit bound for the mean square error of the estimator defined 
in ( 11.31) . In Section EH we construct two different (e — a)— approximation 
procedures, one based on the sample mean of one long trajectory and the 
other based on the median of multiple shorter runs. We close with examples 
in Sections 15.51 and 15.61 in particular we show how to obtain explicit lower 
bounds for t and n that guarantee the e — a— approximarion for a hierarchical 
random effects model of practical relevance. 

5.2 A Drift Condition and Preliminary Lem- 
mas 

Since in what follows we deal with integrals of unbounded functions / with 
respect to probability measures, the very common total variation distance 
defined by (12.11) is inappropriate for measuring distances between probability 
measures and we need to use the V— norm and V— norm distance introduced 
in Section I2TT1 

We analyze the MCMC estimation along a single trajectory under the 
following assumption of a drift condition towards a small set. 

Assumption 5.2.1. (A. 1) Small set. There exist C G B{X), (3 > and 
a probability measure v on (X,B(X)) such that for all x G C and 



(A. 2) Drift. There exist a function V : X — > [1, oo) and constants A < 1 and 
K < oo satisfying 



A G B(X) 



P(x,A) > (3u(A). 
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(A. 3) Aperiodicity. There exists (3 > such that $v{C) > (3. 

In the sequel we refer to $, V(x), X, K, (3 as drift parameters. 

Remark 5.2.2. Establishing a drift condition for real life examples is usually 
not an easy task. As indicated in |Meyn &: Tweedie 1993| polynomials are 
often suitable candidates for a drift function V and also functions propor- 
tional to 7T 1 / 2 may turn out to be a lucky choice. Computable toy and real 
life examples of [Baxendale 20 05] and [Jones k. Hobert 2004] confirm this 
observations. 

Remark 5.2.3. There is a strong probabilistic intuition behind Assumption 
15.2. 1L Every time the chain visits the small set C, it regenerates with proba- 
bility (3. The role of the drift condition (A. 2) is to guarantee that the chain 
visits the small set C frequently enough. Typically C is in the „center" of 
the state space X and the drift function V takes small values on C and in- 
creases as it goes away from C. Assume first that X n = x ^ C. The condition 
PV(x) < XV (x) means that X n+1 ~ P(x, •) is on average getting closer to 
C (closer in terms of V). Whereas PV{x) < K for X n = x e C means that 
X n+ i will perhaps jump out of C, but not too far away, i.e. the integral of 
V with respect to the distribution of X n+ i is bounded (by the same value) 
for all x G C. Assumption (A. 3) together with (A.l) imply aperiodicity. 

Assumption 15.2.11 is often used and widely discussed in Markov chains 
literature. Substantial effort has been devoted to establishing convergence 
rates for Markov chains under the drift condition (A. 1-3) or related as- 
sumptions. For discussion of various drift conditions and their relation see 
Meyn and Tweedie |Meyn k. Tweedie 1993| . For quantitative bounds on 
convergence rates of Markov chains see the survey paper by Roberts and 
Rosenthal [Roberts fc Rosenthal 2 005] and references therein. In the sequel 
we make use of the recent convergence bounds obtained by Baxendale in 
[Baxendale 2005) . 

Theorem 5.2.4 (Baxendale [Baxendale 2005] ). Under Assumption \5.2.1\ 
(X) n > has a unique stationary distribution it and irV < oo. Moreover, there 
exists p < 1 depending only and explicitly on (3, (3, X and K such that when- 
ever p < 7 < 1 there exists M < oo depending only and explicitly on 7, (3, (3, X 
and K such that for all n > 




(5.3) 
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When we refer in the sequel to V— uniform ergodicity, we mean the con- 
vergence determined by (15.31) . There are different formulas for p and M for 
general operators, self adjoint operators and self adjoint positive operators 
in both atomic and nonatomic case. We give them in Section [5781 for the sake 
of completeness. To our knowledge the above-mentioned theorem gives the 
best available explicit constants. 

Corollary 5.2.5. Under Assumption \5.2.1\ 

||7r P n - 7r||v < min{7r V; || vr - 7r||y}M7 n , 
where M and 7 are such as in Theorem \5. 2.4\ 

Proof. From Theorem 15.2.41 we have \\P n (x,-) — 7r(-)||y < Mj n V(x), which 
yields 

ttqVW > / \\P n {x,-)-7r{-)\\ v ir Q {dx)> sup / \P n (x, -)g - 7ig\n (dx) 
Jx \g\<vJx 

> sup \n P n g - ng\ = \\n P n - ir\\ v . 

\9\<V 

Now let by = inf^g^ V(x). Since 1 1 1 • |||y is an operator norm and it is invariant 
for P, we have 

|| 7roP n_ 7r || v = bv \\\Tr P n -Tr\\\ v = b v \\\(7T -7r)(P n -7r)\\\ v 

< b v \\\-K — ir\\\ v \\\P n — ir\\\ v = \\tt — ir\\ v \ \ \P n — tc\ \ \ v . 

< Iko - 7r||yM7 n . 

□ 

Now we focus on the following simple but useful observation. 

Lemma 5.2.6. If for a Markov chain (X n ) n > on X with transition kernel 
P Assumption \5.2.l\ holds with parameters f3, V(x), A, K, /3, it holds also with 
(3 r := 0, V r (x) := V{x) l ' r , \ r := A 1/r , K r := K l ' r , f3 r := (3 for every r > 1. 

Proof. It is enough to check (A. 2). For x ^ C by Jensen inequality we have 

XV(x) > £v(y)P(x,dy) > (J v(y)^ r P{x,dy) 

and hence PV(x) 1 ^ < \ l l r V(x) l l r \ as claimed. Similarly for x £ C we obtain 
PV{xf/ r < K l l r . □ 
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Lemma T5.2.6I together with Theorem 15.2.41 yield the following corollary. 
Corollary 5.2.7. Under Assumption \5.2.1\ we have 

\\\P n -7T\\\ vl/ r<M r ^, 



where M r and 7 r are constants defined as in Theorem \ 5. 2.j\ resulting from 
drift parameters defined in Lemma l5. 2.b\ 

Integrating the drift condition with respect to 7r yields the following bound 
on tcV. 

Lemma 5.2.8. Under Assumption \5.2.1\ 

w < <C) — < — . 

Let f c = f — 7r/. The next lemma provides a bound on ||/ c | p |y in terms 
of ||/| p |y without additional effort. 

Lemma 5.2.9. Under Assumption \5.2.1\ 



by 



K 1 /p-X 1 /p 



where b v = inf xe # V(x), C f p = \ \f\ p \ v and K p>x - ^^7p ' 
Proof Note that ttV 1/p < n(C)K PtX < K P)X by Lemma [5X8] and proceed: 

miv = S W) - S S W) 



C 1 f i p V 1 /p(x)+7r(C)K PtX 

< sup^ — >-< 

xdX V(x) 



V 



Jy 



□ 
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5.3 MSE Bounds 



By MSE(Io in ) we denote the mean square error of fo,n> i- e - 

MSE(I 0tn ) =E no [i , n -I] 2 . 

Bonds on MSE(I 0}n ) are essential to establish (e — a)— approximation of type 
(15.21 ) and are also of independent interest. 

Theorem 5.3.1. Assume the Drift Condition \5. 2. 1\ holds and X ~ 7r . Then 
for every measurable function f : X —* R, every p > 2 and every r G 

MSE (U < MM! U + W f, v + ^mm{y,N- T || y} \ 
n V 1 -7r/ V n(l -7) / 

(5-4) 

where f c — f — nf and constants M, 7,M r ,7 r depend only and explicitly 
on {3, (3, A and K from Assumption \5. 2. H as in Theorem \5. 2.J\ and Corollary 

The formulation of the foregoing Theorem 15.3.11 is motivated by a trade- 
off between small V and small A in Assumption l5.2.11 It should be intuitively 
clear that establishing the drift condition for a quickly increasing V should 
result in smaller A at the cost of bigger ttV. So it may be reasonable to look 
for a valid drift condition with V > C||/ C | p | for some p > 2 instead of the 
natural choice of p = 2. Lemma 15.2.61 should strengthen this intuition. The 
most important special case for p = r = 2 is emphasized below as a corollary. 

The unknown value n V in (15.41) depends on 7r which is users choice and 
usually a deterministic point. Also, in many cases a fairly small bound for 
iiV should be possible to obtain by direct calculations, since in the typical 
setting ii is exponentially concentrated whereas V is a polynomial of degree 
2. These calculations should probably borrow from those used to obtain the 
minorization and drift conditions. However, in absence of a better bound for 
ttV Lemma 15.2.81 is at hand. Similarly Lemma 15.2.91 bounds the unknown 
value ||/ c | p |y P in terms of ||/| p |i/- Note that in applications both / and V 
have explicit formulas known to the user and | y can be evaluated directly 
or easily bounded. 

Proof. Note that |/|yi /r = ||/| r |y- Without loss of generality consider f c 
instead of / and assume ||/ c | p |v = 1- In this setting |/ c 2 |y < 1, Var n f c = 
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■nfl < ttV, MSE(I 0>n ) = E no (I 0tn ) 2 , and also for every r G [^p], 

|/c|^<||/c| p/r |yV, = l and |/ C | v i- V r < \\f c \ p - p/r \ v i-Vr = 1. 
Obviously 

- n— 1 2 n— 2 n—1 

nMSE{h, n ) = VcW 2 +-E E W)ffl.(5.5) 

i=0 i=0 j=i+l 

We start with a bound for the first term of the right hand side of (|5.5I) . Since 
fc( x ) — V( x )> we use Corollary 15.2.51 for / c 2 . Let C = min{7r V, || vr - M\v} 
and proceed 

n-1 n-1 n-1 „ 

1=0 i=0 i=0 



n — ' " n * — ' n ' — ' nil — 7) 

i=0 i=0 i=0 V // 

(5.6) 

To bound the second term of the right hand side of (15. 5p note that |/ c | < V" 1 / 7 * 
and use Corollary 15.2.71 



n— 2 n—1 „ n— 2 n—1 

^ V ^ V ^ , , r. ( r \ r. ( r \ ~ 



-EE w/ffl = -EEMft/^/.)) 

i=0 j=i+l 

^ !EE-o(^(l/ c |l^-7 c |)) 



n ^ — ' — ' n 

j=0 j'=t+l i=0 j=i+l 

2 n— 2 n—1 



< 



n 

1=0 j=l+l 

2M r '~ 2 x 



E E ^rvo (p* (i/civ 1 /*-)) 



n 

i=o i=«+i 

n-2 



Since |/ c | < V 1/r and |/ c | < r 1 " 1 /^ also |/ c yVr| < y an d W e use Corollary 
EZBJfor 

"-(i-Tr)^ l-7r\ 71(1-7)/ 

Combine (I5.6P and (15. 7K to obtain 



n \ 1 — 7 r / \ n(l — 7) 
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□ 

Corollary 5.3.2. In the setting of Theorem \5. 3.11 we have in particular 



use fa < fi + 2 -Mi2i) U + Mmill{ ^ »*° " '«"> 



n V 1 — 72 / V n (l ~ 1) 



(5.8) 



The foregoing bound is easy to interpret: ^V\f1\v should be close to 
Var n f for an appropriate choice of V, moreover 2M 2 72/(1 — 72) corresponds 
to the autocorrelation of the chain and the last term Mmin{7roV, 1 1 tto — 
7r||y}/n(l — 7) is the price for nonstationarity of the initial distribution. 
See also Theorem 15.3.41 for further interpretation. 

Theorem 15.3. II is explicitly stated for I 0>n , but the structure of the bound 
is flexible enough to cover most typical settings as indicated below. 

Corollary 5.3.3. In the setting of Theorem \5. 3.11 

MSE(I ,n) < 1 + z , if tto = 7T, (5.9) 

n V 1 - 7r / 

MSE(I , n )<——^ + T —- ^ = 6., 

(5.10) 

MSE(l\ n) <^( 1 + ^)L V+ ^pM), if ^ 5x . 



n \ 1 — 7 r / \ 71(1 — 7) 



(5.11) 



Proof. Only (15.111) needs a proof. Note that X t ~ S x P t . Now use Theorem 
15X41 to see that \\5 x P l - ir\\ v < M*fV(x), and apply Theorem EXT] with 

7T = 5 X P\ □ 

Bound (15.91 ) corresponds to the situation when a perfect sampler is avail- 
able. For deterministic start without burn-in and with burn-in (15.101) and 
(15.111) should be applied respectively. 

Next we derive computable bounds for the asymptotic variance <r| in 
central limit theorems for Markov chains under the assumption of the Drift 
Condition EXU 
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Theorem 5.3.4. Under the Drift Condition \5.2.1\ the Markov chain (X n ) n > 
and a function f, such that |/ c 2 |y < oo (or equivalently \f 2 \v < 00), admit a 
central limit theorem, i.e: 

, aj) as n — > 00, (5-12) 

moreover 

a) = lim nE«[I Q>n - if < W\\f c \*$* (l + ^) . (5.13) 

Proof. The CLT (i.e. (15.121) and the equation in 05.131) ) is a well known fact 
and results from V— uniform ergodicity implied by Theorem 15.2.41 combined 
with Theorems 17.5.4 and 17.5.3 of |Meyn k Tweedie 1993| . Theorem EXU 
with 7r = 7r yields the bound for aj in (15.131) . □ 

Remark 5.3.5. For reversible Markov chains significantly sharper bounds for 
aj can be obtained via functional analytic approach. For a reversible Markov 
chain its transition kernel P is a self-adjoint operator on L\. Let / G L\ and 
nf = 0. If we denote by Ef the positive measure on (—1, 1) associated with / 
in the spectral decomposition of P, we obtain (cf. |K ipnis k. Varadhan 19 86 1, 
|Geyer 1992 Q 

a) = [ \±±E f (d\) < \^VarJ < \^*V\f c \ v . (5.14) 
7(-i,i) i-A 1 — p l p 

Where the first inequality in (15.141 ) holds if we are able to bound the spectral 

radius of P acting on L\ by some p < 1 (cf. |Geyer 1992| , [Roberts fc Rosenthal 1997b] ). 

Corollary 6.1 of [Ba xendale 2005] yields the required bound with p defined 

as in Theorem 15.2.41 

5.4 (e — a)— Approximation 

(s—a)— approximation is an easy corollary of MSE bounds by the Chebyshev 
inequality. 
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Theorem 5.4.1 ((e — a)— approximation). Let 

, 7tV\\mf / , 2M r7r 



, 1 + > (5-15) 

e 2 a \ 1 — j r J 

MmmlTro^llTro-Trllyjll/.lPl 2 ^ / + 2M r7r \ 
e 2 a(l-7) V 1-Tr/' 



nw = ^Tjg) (6l?) 

c (t) = ^wii/firA + 2j^y (5 , 18) 



(5.19) 



£ 2 a(l — 7) \ 1 — 7 
MM«f / 2^7 



e 2 a(l — 7) \ 1 — 7 
T/zen under Assumption ^. 2. 1\ 



P(| V - /| < e) > 1 - a, X ~ 7T , n > ■ (5.20) 



-I\< :) > 1 -n. //<;*> max 

n > n(£). 



0,log 7 (^g^)}(5.21) 



^4nc? t/ie above bounds in \5.21\) give the minimal length of the trajectory 
(t + n) resulting from $5.11\) . 

Proof. From the Chebyshev's inequality we get 
P(|/ tl „ - I| < e) = 1 - P(\It,n ~ I\ > e) 

> 1 - MSE j It ^ >i- a if MSE{i t , n ) < e 2 a(b.22) 

To prove (15.201) set C = min{7r V, 1 1 vtq — 7r||y}, and combine 05.221) with (15.41) 
to get 

,2 ^||/ c | p |y /p A 2M r7r \ MC\\W 2 J P f 2M rlr 



n z _ n - m^i lv 1 + z - Zi _M _ 1 _|_ ZZ^LIL ] > 0, 

e 2 a \ 1 — 7r y e 2 a(l — 7) \ 1 — 7 r . 



and hence n > b+v/ ^ 2+4c ; where 6 and c are defined by ( 15.151) and ( 15.161) 
respectively. The only difference in (15.211) is that now we have c(t) defined 
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by (15.181) instead of c. It is easy to check that the best bound on t and n (i.e. 
that minimizes t + ri) is such that 

n > n(t) and £ > max {0, min{t G N : n'(t) > -1}} , 

where n(t) is defined by (15.171) . Standard calculations show that 

min{t G N : n'(t) > -1} = min{t G iV : (7*) 2 c 2 In 2 7 - 7*4c - 6 2 < 0}, 

where c is defined by (15.191) . Hence we obtain 



Remark 5.4.2. The formulation of Theorem l5.4.1l and the above proof indicate 
how the issue of a sufficient burn-in should be understood. The common 
description of t as time to stationarity and the often encountered approach 
that t* = t(x,e) should be such that p(7r,<5 x P**) < e (where /?(•,•) is a 
distance function for probability measures, e.g. total variation distance, or 
V— norm distance) seems not appropriate for such a natural goal as (e — 
a)— approximation. The optimal burn-in time can be much smaller then if 
and in particular cases it can be 0. Also we would like to emphasize that in 
the typical drift condition setting, i.e. if X is not compact and the target 
function / is not bounded, the V— norm should be used as a measure of 
convergence, since \ \n t — n\\ tv — > does not even imply ir t f — > irf. 

Next we suggest an alternative estimation scheme that allows for sharper 
bounds for the total simulation cost needed to obtain (e — a)— approximation 
for small a. We will make use of the following simple lemma taken from the 
more complicated setting of [Niemiro fc Pokarowski 2007] . 

Lemma 5.4.3. Let m £ N be an odd number and let Ix,...,I m be inde- 
pendent random variables, such that P(\Ik — I\ < e) > 1 - a > 1/2, for 
k = 1, . . . , m. Define I := med{ii, . . . , I m }. Then 




and 



n > n(t). 



This completes the proof. 



□ 



^(|/-/| <e)>l-a, if m> 



21n(2a) 



(5.23) 



ln[4a(l - a)]' 
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Proof. Since P{\Ik — I\ > e) < a < 1/2, by elementary arguments we obtain 



p(i/-/i> £ ) < e ;>-(i 

fc=(m+l)/2 ^ 



a) n_fc 



< 2 m ~ 1 a m/2 (l-a) m/2 

1 f 772 ^ 

= 2 exp l y ln ( 4a ( 1_a ))j- 

The last term does not exceed a if m > 21n(2o;)/ln[4a(l — a)}, as claimed. □ 

Hence (e — a)— approximation can be obtained by the following Algorithm 
15.4.41 where Theorem 15.4.11 should be used to find t and n that guarantee 
(e — a)— approximation and m results from Lemma [5A3l 

Algorithm 5.4.4. 

1. Simulate m independent runs of length t + n of the underlying Markov 
chain, 

Y^ Y^ h — 1 m 

^0 '•••' yi -t+n-l' & — ±, . . . , ffl. 



2. Calculate m estimates of I, each based on a single run. 

t+n— 1 

m. 



1 t+n— 1 



n 

i=t 



5. For the final estimate take 

I = med{J 1; . . . ,I m }. 

The total cost of Algorithm 15.4.41 amounts to 

C = C(a) = m(t + n) (5.24) 

and depends on a (in addition to previous parameters). The optimal a can 
be found numerically, however it is worth mentioning a = 0, 11969 is an 
acceptable arbitrary choice (cf. [Niemiro &; Pokarowski 20 07J). A closer look 
at equation (15.241) reveals that the leading term is 

= 1 ( 21n{(2a)- 1 }7rK||/ c n 2 / p / 2M rl , 

aln{[4a(l-a)]- 1 } \ e 2 \ 

where b is defined by ( 15.151) . Function aln{[4a(l — a)] -1 } has one maximum 
on (0,1/2) at a « 0, 11969. 
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5.5 A Toy Example - Contracting Normals 

To illustrate the results of previous sections we analyze the contracting nor- 
mals example studied by Baxendale in |Baxendale 2005] (see also [Roberts &: Tweedie 19 99J. 
[Roberts fc Rosenthal 1997a] and [Rosenthal 1995a| ). where Markov chains 
with transition probabilities P(x, •) = N(9x, 1 — 9 2 ) for some parameter 
9 G (—1, 1) are considered. 

Similarly as in [Baxendale 2005] we take a drift function V(x) = 1 + 
x 2 and a small set C = [— c, c] with c > 1, which allows for A = 9 2 + 
2 ^~J; - < 1 and K = 2 + 9 2 {c 2 — 1). We also use the same minorization 
condition with v concentrated on C, such that (3u(dy) = min. rg c'(27r(l — 



9 2 ))-y 2 eM- { 40))dy- This yields = 2[<&(^) - H^)), where $ 



denotes the standard normal cumulative distribution function. 

Baxendale in [Baxendale 2005) indicated that the chain is reversible with 
respect to its invariant distribution 7r = N(0, 1) for all 9 G (—1, 1) and it is 
reversible and positive for 9 > 0. 

Moreover, in Lemma l5.5.1l we observe a relationship between marginal dis- 
tributions of the chain with positive and negative values of 9. By C(X n \X , 9) 
denote the distribution of X n given the starting point X and the parameter 
value 9. 

Lemma 5.5.1. 



Proof. Let Zi, Z 2 , . 




(5.25) 



n 



C(X n \X , 9) 



c(9 n X + 9 n - k Vl :r 9 2 Z k S j 



fc=l 



n 



fc=i 



£(X n \(-irx ,-9), 



and we used the fact that Z k and — Z k have the same distribution. 



□ 



For 9 < using Lemma l5.5.1l and the fact that V(x) = 1+x 2 is symmetric 



we obtain 



\\C{X n \X ,9)-7T\\ v = \\C(X n \(-irX ,-9)-n\\ v <M 1 n V((-l) n X ) 

= M 1 n V(X ) = M^il + X 2 ). 
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Thus for all 6 G (—1, 1) we can bound the V— norm distance between tt 
and the distribution of X n via Theorem 15.2.41 with p and M = M(y), 
where 7 G (p, 1), computed for reversible and positive Markov chains (see 
Appendix 15.8.31 for formulas). The choice of V(x) = 1 + x 2 allows for 

Table [5751 - bounds based on Baxendale's V— uniform ergodicity constants. 
\9\ e a p P2 7 72 M M 2 m t n total cost 

.5 .1 1 .895 .899 .915 .971 36436 748 1 218 6.46e+09 6.46e+09 
.5 .1 1(T 5 .895 .899 .915 .971 36436 748 1 218 6.46e+13 6.46e+13 
.5 .1 1(T 5 .895 .899 .915 .971 36436 748 27 218 5.39e+09 1.46e+ll 

(e — a)— approximation of J x f{x)ix{dx) if \ f 2 \v < 00 for the possibly un- 
bounded function /. In particular the MCMC works for all linear functions 
on X. We take f(x) = x where |/ 2 |y = 1 as an example. We have to provide 
parameters and constants required for Theorem 15.4. 1L In this case the opti- 
mal starting point is X = since it minimizes V(x) = 1 + x 2 . To bound nV 
we use Lemma 15.2.81 and Lemma 15.2.91 yields a bound on ||/ c | 2 |y P = \fc\v- 

Examples of bounds for t and n for the one walk estimator, or t, n and 
m for the median of multiple runs estimator are given in Table 15.51 The 
bounds are computed for c = 1.6226 which minimizes p2 (rather than p) 
for 6 = 0.5. Then a grid search is performed to find optimal values of 7 
and 72 that minimize the total simulation cost. Note that in Baxendale's 
constant M depends on 7 and M goes relatively quickly to 00 as 7 — > p. 
This is the reason why optimal 7 and 72 are far from p and p<i and this 
turns out to be the main weakness of Baxendale's bounds. Also for small 
a = 10~ 5 we observe a clear computational advantage of the median of 
multiple runs estimation. The m = 27 shorter runs have significantly lower 
total cost then the single long run. R functions for computing this example 
and also the general bounds resulting from Theorem 15.4.11 are available at 
http:/ /akson. sgh.waw.pl/~klatus/ 

5.6 The Example - a Hierarchical Random Ef- 
fects Model 

In this section we describe a hierarchical random effects model which is a 
widely applicable example and provides a typical target density tt that arises 
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in Bayesian statistics. Versions of this model and the efficiency of MCMC 
sampling have been analyzed e.g. by Gelfand and Smith in |Gelfand &: Smith~1 990j. 
Rosenthal in [Rosenthal 1995a| . [Rosenthal 1995b| and many other authors. 
In particular Hobert and Geyer in |Hobert fc Geyer 1998| analyzed a Gibbs 
sampler and a block Gibbs sampler for this model and showed the under- 
lying Markov chains are in both cases geometrically ergodic (we describe 
these samplers in the sequel). Jones and Hobert in [Jones &: Hobert 2 004J 
derived computable bounds for the geometric ergodicity parameters and con- 
sequently computable bounds for the total variation distance •) — Tr\\ tv 
to stationarity in both cases. They used these bounds to determine the 
burn-in time. Their work was a breakthrough in analyzing the hierarchical 
random effects model, however, mere bounds on burn-in time do not give a 
clue on the total amount of simulation needed. Also, bounding the total vari- 
ation distance seems inappropriate when estimating integrals of unbounded 
functions, as indicated in Remark I5.4.2L In this section we establish the 
(e — a)— approximation for the hierarchical random effects model. This con- 
sists of choosing a suitable sampler, establishing the Drift Condition 15.2.11 
with explicit constants, computing V— uniform ergodicity parameters, and 
optimizing lower bounds for t and n in case of estimation along one walk 
or for t, n and m in (I5.24K for the median of shorter runs. This may turn 
out to be a confusing procedure, hence we outline it here in detail, discuss 
computational issues and provide necessary R functions. 

5.6.1 The Model 

Since we will make use of the drift conditions established by Jones and Hobert 
in [Jones fc Hobert 2004] we also try to follow their notation in the model 
description. Let /i and Xg be independent and distributed as 

fi ~ N(m , Sq 1 ) and Xg ~ Gamma(ai, b\), 

where m G M, s > 0, cq > 0, and b\ > are known constants. 

At the second stage, conditional on /i and Xg, random variables 9i, . . . 9 k 
and A e are independent and distributed as 

9i\fi, Xg ~ N(n, Xq 1 ) and A e ~ Gamma(a 2 , b 2 ), 

where a 2 > 0, b 2 > are known constants. 
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Finally in the third stage, conditional on 6 = {9i, ... ,9k) and A e , the 
observed data y = {Yij} are independent with 

Y ij \9,K^N{9 i ,\~ 1 ), 

where i = 1, . . . , K and j = 1, . . . , rrii. 

The Bayesian approach involves conditioning on the values of the observed 
data {Yij} and considering the joint distribution of all K +3 parameters given 
this data. Thus we are interested in the posterior distribution, that is, the 
following distribution defined on the space X = (0, oo) 2 x M. K+1 , 

C{e u ...,e K , f i i X 0} X e \{Y ij }) = ir(9,n,X\y) (5.26) 

oc d(y\9, X e )d(9\fi, \ e )d(\ e )d(X e )d(^) = * 

where d denotes a generic density and hence the final formula for the unnor- 
malised density takes the form of 



K K mi 

X 

i=l i=l j=l 



J\ l\ '171 

[e-Wft-zO";^] x -q j| j e _! Aefe _^ A i /2 j ^ (5 _ 27) 



and we have to deal with a density that is high-dimensional, irregular, strictly 
positive in X and concentrated in the „center" of X, which is very typical 
for MCMC situations [Roberts fc Rosenthal 2005] . Computing expectations 
with respect to n(9, fi, X\y) is crucial for bayesian inference (e.g. to obtain 
bayesian estimators) and requires MCMC techniques. 

5.6.2 Gibbs Samplers for the Model 

Full conditional distributions required for a Gibbs sampler can be computed 
without difficulty. Let 

Vi'= — y^Vij, M:=Vm i5 ^T^XX 
rrii ' K / — ' 

3=1 i i 



A 



9-i := {9\, ... , 9 i+ i, . . . , 9k), vi(9, fi) := — nY 



i=i 
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K 

u 2 (6) := - m) 2 , SSE := ( Vij - y^ 2 



Now the conditionals are 



Xe\9,ii, X e ,y ~ Gamma + a u Vl ^' ^ + bA , (5.28) 

, x ^ ^ M u 2 (e) + ssE \ . 

X e \0,li,Xg,y ~ Gamma I — + o 2 , h o 2 J , (5.29) 

a i/i \ \ AT f Xefi + rriiXeyi 1 \ 
ei\e-i,id,Xe,X ej y ~ iV — - — ■ — ,- — ■ — , (5.30) 

V Xe + rriiXe X e + miX e J 

li\9,Xg,X e ,y ~ AM — - — , — — — . 5.31 

V s + KA<j s + K X J 

Gibbs samplers for the variance components model and its versions have been 
used and studied by many authors. We consider the two Gibbs samplers 
analyzed by Jones and Hobert in [Jones &: Hobert 20 04 . 



The fixed-scan Gibbs sampler that updates fi, then 9 = . . . 9k), 
then A = (Ae,A e ). Note that 0j's are conditionally independent given 
([A, A) and so are A<j and A e given (9,fi). Thus the one step Markov 
transition density (//, 9', A') — ► (/i, 9, A) of this Gibbs sampler is 



KM,A|//,0',A') = d(ji\e , ,X',y) 



K 



Y\_d(6i\n, X',y) 



i=l 



(5.32) 



x d(X e \9, fj,,y)d(X e \9, n,y). 



Where d denotes a generic density and y = {Yij}, i = 1, . . . , K; j = 
1, . . . rrii, is the observed data. 

Hobert and Geyer in |Hobert fc Geyer 1998| introduced a more effi- 
cient block Gibbs sampler (also analyzed by Jones and Hobert in 
[Jones fc Hobert 2004| ). in which all the components of 

Z = {e 1 ,...e K ,n) 

are updated simultaneously. It turns out that 

i\X,y ~jV(f,£) where C = C(X,y) and £ = E(A,j/). 
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Thus the one step Markov transition density (A',£') — > (A, £) of the 
block Gibbs sampler is 



p (a, ei a', o = d(A e ie', s/)d(A e ie', 2/)rf(eiA, y). 



(5.33) 



We give now the formulas for £* and £ derived in |Hobert fc Geyer 1998|. 
Let 

n),XoX, 



T 



i=l 



'+m,i\ e 



then 



A' 



miX e X e yi 



and 



SO + T 

6— 

Aq£(//|A) 

Afl + TTU X e 



1 

A e + rriiX t 



" Ag +mi A e 

mXeVi 



+ rn s 



1 + 



A! 



(X e + mjA e )(s + r) 



A^ 



E(»\X) 
E(6i\X) 

Var(6 t \X) 
Cov(6 i ,6 j \X) 
Cov(9 h 9 j \X) 
Var(n\X) 

S + T 

5.6.3 Relations between Drift Conditions 

A crucial step for (e — a)— approximation is establishing the drift condition 
15.2.11 which in the sequel will be referred to as the Baxendale-type drift 
condition. To this end we use the Rosenthal-type (cf. [ Rosenthal 1995b| ) and 
Roberts-and-Tweedie-type (cf. [Roberts fc Tweedie 1999| ) drift conditions 
established by Jones and Hobert in [Jones fc Hobert 200 4] combined with 
their type of a small set condition. 

In the following definitions and lemmas P denotes the transition kernel of 
the Markov chain (X n ) n ^ and the subscripts of drift condition parameters 
indicate the type of drift condition they refer to. 



X e + miX e )(X e + m i A e )(s + r) 
X 2 



X e + miX e ) (Xe + rrij A e ) (s + r) ' 
1 
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Assumption 5.6.1 (The Rosenthal-type drift condition). 

(R-l) There exists a function Vr : X — > [0, oo) and constants < < 1 and 
Kr < oo satisfying 

PV R (x) < \ R V R (x) + K R . (5.34) 

(R.2) Let C R = {x G X : V R (x) < d R }, where d R > 2K R /(1 - X R ). There 
exists a probability measure Ur on X and (3r > 0, such that for all 
x G C R and A G B(X), 

P(x,A)>p R v R (A). (5.35) 

Assumption 5.6.2 (The Roberts-and-Tweedie-type drift condition). 

(RT.l) There exists a function Vrt '■ X — * [1, oo) and constants < Xrt < 1 
and Krt < oo satisfying 

PVr T (x) < \rtVr T (x) + K rt I Crt (x), (5.36) 

where C RT = {x G X : V RT (x) < d RT }, and d RT > - 1. 

(RT.2) There exists a probability measure vrt on X and (3rt > 0, such that 
for all x G Crt and A G B(X), 

P(x,A) >Prtv rt (A). (5.37) 

The following lemma relates the two drift conditions. 

Lemma 5.6.3 (Lemma 3.1 of [Jones &: Hobert 2004| ). Assume that the Rosenthal- 
type drift condition holds. Then for any d > the Roberts-and-Tweedie-type 
drift condition holds with parameters 

Vrt = V r +1, Xrt = Xrt(oI) = ^ , K RT = K R +1-X R , (3 RT = (3 R , 

Crt = C RT {d) = lx G X : V RT {x) < + l ^ RT \ an d Urt = Ur , 
L a{i — Art) ) 

The Baxendale-type drift condition we work with results from each of the 
above conditions and the following lemma is easy to verify by simple algebra. 
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Lemma 5.6.4. If the Rosenthal-type or the Roberts- and- Tweedie-type drift 
condition holds, then the Baxendale-type drift condition (A. 1-2) verifies with 



V = V RT = V R + 1, A = X(d) = X RT = 

a + 1 



v = vrt = vr, C = C(d) = C RT , (3 = (3 RT = f3 R , 

d 2 + 2d + X R 



K = K(d) = K RT + X RT d RT = (K R + 1 - A 



R 



d(l - Aj 



Observe next that integrating each of the drift conditions yields a bound 
on nV similar to the one obtained in Lemma 15.2.81 and the best available 
bound should be used in Theorem 15.3.41 and Theorem 15.4.11 In particu- 
lar, if the Baxendale-type drift condition is obtained from the Roberts-and- 
Tweedie-type drift condition via Lemma 15.6.41 integrating the latter always 
leads to a better bound on nV. Also, if one starts with establishing the 
Rosenthal-type drift condition, the value of d used for bounding tcV does 
not have to be the same as the one used for establishing the Baxendale- 
type drift and minorization condition and it should be optimized. Moreover 
jzf^ + 1 < i^Xr T < T^T f° r ever y d > 0. This leads to the following lemma 
which can be checked by straightforward calculations. 



Lemma 5.6.5. Provided the drift functions are as in Lemma \5. 6.4\ the bound 
on nV can be optimized as follows 



Krt 1 Kr \ _Kr_ 
-X RT {d)\'l-X R J - 1-A fl 



tcV < mini inf \jt{C RT {d)) x . „ \ < -h 1. 



(5.38) 



5.6.4 Drift and Minorization Conditions for the Sam- 
plers 

For the fixed-scan Gibbs sampler and the block Gibbs sampler of Section 
15.6.21 Jones and Hobert in [Jones fc Hobert 2 004J (Section 4 and 5 therein) 
obtained the following drift and minorization conditions. See their paper for 
derivation and more elaborative commentary of these results. 
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Drift and Minorization for the block Gibbs Sampler 

Assume m! = min{mx, . . . , m K } > 2 and K > 3. Moreover define 

K 

2ai + K-2 2a 2 + M-2 



81= , 8 2 = , 6 3 =(K+1)5 2 , 5 4 = 6 2 y^m7 1 , 

i=l 



rx jm 2fe i 2b 2 + SSE 

= maxjdi, d 3 |, d = — — — -, c 2 



2a 1 + K-2 2a 2 + M-2 

Observe that < 5i < 1 for i — 1, 2, 3, 4. Also let A denote the length of the 
convex hull of the set {f/i, . . . ,yx, m }. 

Proposition 5.6.6 (Drift for unbalanced case). Fix Xr G (5,1) and let 0i 
and 02 be positive numbers such that + 5 < Xr. Define the drift function 
as 

/x) = <Mi(0, /i) + <p 2 ^{0), (5.39) 

where vi(Q,fj,) and v 2 (6) are defined in Section l5. With this drift function 
the block Gibbs sampler satisfies the Rosenthal-type drift condition with 

K R = 0i [ci + c 2 / + iTA 2 ] + 2 [c 2 (K + 1) + MA 2 ] . (5.40) 

A better drift condition can be obtained in the balanced case, when rrii = 
m > 2 for i = 1, . . . , K. Let S 5 = KS 2 . 

Proposition 5.6.7 (Drift for balanced case). Fix Xr G (5, 1) and let be a 

positive number such that 0^ + 5 < Xr. Define the drift function as 

V 2 (6, f i)=<f ) u 1 (e,fi) + m- 1 u 2 (6). (5.41) 

With this drift function the block Gibbs sampler satisfies the Rosenthal-type 
drift condition with 

K 

c 2 



K R = 0ci + (<f)K + K + l)-^ + max{0, 1} )^max {(y - y^) 2 , (m - yi) 2 }, 

(5.42) 



m 

i=l 



where y := K 1 J2i=i V 
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Proposition 15.6.81 (Proposition 4.1 of [Jo nes fc Hobert 2004| ) provides a 
minorization condition for the Rosenthal-type drift-minorization condition 
for the block Gibbs sampler for both, the balanced and unbalanced case. Note 
that the balanced case drift function V2 is a special case of the unbalanced 
drift function Vx, hence we focus on V\. 

Now consider the candidate C R = {{8,fi) : Vx(8,fi) < d R } for a small 
set. Note that C R is contained in Sb = Sb 1 H Sb 2 , where S'b 1 = {(9,fi) : 
vi(0,n) < d R /4>i\ and Sb 2 = {(8, fi) : 1*2(8) < d R /4> 2 }. Hence it is enough to 
establish a minorization condition that holds for Sb- 

Let T(a,P;x) denote the value of the Gamma(a,/3) density at x and 
define functions hx(Xg) and h 2 (X e ) as follows: 



hi(X e 



r( -j + ax, bx, Xg), Xg < Xg, 

r (f + ax,^ + bx;Xg), Xg > Xg, 



where 



h 2 {X e 



M K + 2a i) 1 t-\ d R \ 
Xe= d R lQg ( 1 + ^ 

T(f + a 2 ,^ + 6 2 ;A e ), X e < A*, 

r(f + a 2 , d ^f 2 SE + b 2 -X e ), X e >X* e , 

2 (M + 2a 2 ) ( d R ^ 

K = T R lQg ( 1 + 2 (26 2 + ^) )- 

Now define a density q(X, 8, fi) on R 2 + x R K x R by 



and 



where 



Ir+ hx(Xe)dX e J V h 2 (X e )dX 



where g?(£|A, y) is the normal density in (15.331) resulting from the block Gibbs 
sampler construction. Next define 



I3r= \Jr hl ^ dXe ){J R h 2(X e )dX, 

Also recall p(X, £|A', £') = p(X, 8, /i|A', 8', fjf), the Markov transition density of 
the block Gibbs sampler as specified in (I5.33j) . 

We are in a position to state the minorization condition. 
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Proposition 5.6.8 (Minorization Condition). The Markov transition den- 
sity for the block Gibbs sampler satisfies the following minorization condition: 

p(\,6^\\',6'^')>P R q(\,6^) for every eS B . (5.43) 

Drift and Minorization for the fixed-scan Gibbs sampler 

As before assume that K > 3 and 

2 < m! = min{mi, . . . , ttik} < maxjmi, . . . , ttik} = m". 

Define 

. K 2 + 2K ai 1 
06 = — — — and 07 



2s + K 2 + 2K ai 2(ai - 1) 

Clearly o" 6 G (0, 1) and if a x > 3/2 then also 5 7 G (0, 1). Moreover if a x > 3/2, 
then since 2s oi > 0, there exists p\ G (0, 1) such that 



(K+ 5 fy i<Pl . (5.44) 



6, 

h 

Define also 

K 

(0-v) 2 and .s 2 = ' 

s + KX g 



v 3 (6,\)= KX ° {6-yf and s 2 = ^(Vi ~ V? ■ 



i=i 



Proposition 5.6.9 (Drift Condition). Assume that a\ > 3/2, 5m' > m" and 
let pi G (0, 1) satisfy j5.44\ )- Fix 

c 3 G (0, min{6i, b 2 }) and X R G (max{pi, 8 6 , 5 7 }, 1). 

Define the drift function as 

V 3 (9, A) = e c ^ + e c ^ + -A- + v 3 (9, A). (5.45) 

KdiAg 

With this drift function the fixed-scan Gibbs sampler satisfies the Rosenthal- 
type drift condition with 

K R={7 ) +(7 ) 2 +(S 6 + 5 7 )[— + {m -y) +—\+——. 

&1-C3 7 '02-C3 7 L S A 

(5.46) 
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We now turn to the minorization condition for the fixed-scan Gibbs sam- 
pler provided in Section 5.2 of [Jones &: Hobert 2004] . Similarly as before, 
consider the candidate C R = {(9, A) : V 3 < d R } for a small set and let 

c 4 = — 7 — , c l = y-y/(m - y) 2 + d R and c u = y+ ^/(m - y) 2 + d R . 
Kd x a R 

The minorization condition will be given on a set Sq such that 

Cr Q Sq = Sd n Sg 2 n Sg 3 , 



where 



S<h = { 


[(M) 


Sg 2 = \ 


[(M) 


Sg 3 = \ 





, log d R 

c 3 
log d R 

c 3 

s m + KX g 9 



A : Q < — — < c u k 

Sn + AAfl J 



s + KX 6 

Moreover to assure that Sqi H Sg 2 is nonempty, choose such that 

c 3 5 7 



d R \ogd R > 



K8 X ' 



Let N((, a 2 ; x) denote the value of the N((, a 2 ) density at x and define 
functions #) and ^(/-O as follows: 

(9) = (g) f ex P { - ^ £ [(ft - + mi (9 % - y,f] } 



and 

iV(c u ,[ So + ^^]" 1 ;/i), A*<y, 



9M 



C3 
C3 

K w D2 



Now define a density oni?x R x i?i by 



Ma J^x 9)g 2 (fi)d9dfiJ 
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where d(X\fi,9,y) is the joint Gamma distribution of Xg and A e in (15.321) 
resulting from the fixed-scan Gibbs sampler construction. Next define 



R 



SQ + Kd \l/2 



s + 



K log d R 




R JR K 



gi(fi,9)g 2 (fx)d9dn). 



Also recall 6 1 , A|/i', 6 1 ', A'), the Markov transition density of the fixed-scan 
Gibbs sampler as specified in (15.32p . We are in a position to state the mi- 
norization condition. 

Proposition 5.6.10. The Markov transition density for the fixed-scan Gibbs 
sampler satisfies the following minorization condition 



p(fj,, 9, A|/i', 9', A') > j3 R q((jL, 9, A) for every (9', A') G S G . 



(5.47) 



Moreover Jones and Hobert in [Jones fc Hobert 2004] obtained a closed 
form expression for f3 R in ( 15.471) involving the standard normal cumulative 
distribution function $. Let 



. log d R / A 

i=l 



mi 



m u 



QS + 



c 3 

log d R 

c 3 

log d R 



8=1 
K 



c u s H Kc u + > — 

c 3 V + 



Then 



0. 



R 



K_ 

C4C3 \ 2 



logd 



R 



cxp 



n 1 



exp 



log rf« yfm. 



2c, 



+ mi 



cjs _ Kc\ log ro£) / y-ro,, 
2 2c 3 2v]\JD 



cfso Kef log d R mf 
^ -~2 +2^7 
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5.6.5 Obtaining the Bounds 

We focus on obtaining the bounds for (e — a)— approximation for bayesian 
estimators of parameters /i, X$, X e and 0$. This involves integrating one di- 
mensional projections of the identity function on parameter space. The drift 
function V has to be at least of order f 2 since \f 2 \v has to be finite. Note 
that for the two described samplers different drift conditions has been estab- 
lished and neither of them majorizes quadratic functions in all the parame- 
ters. Thus specifying a parameter, say A e implies the choice of the fixed-scan 
Gibbs sampler with the drift function V3, whereas for \x the block-scan Gibbs 
sampler with drift function V\ or V2 is the only option. 

Once the sampler and the type of the drift condition is chosen, the 
user must provide his choice of \r,4> and <1r for the Rosenthal-type drift- 
minorization condition. The next step is the right choice of d in Lemma r5.6.4l 
which yields the parameters of the Baxendale-type drift condition. Provided 
the Baxendale-type drift condition is established with computable param- 
eters, there are still four parameters left to the user, namely the mutually 
dependent 7 and M in Baxendale's Theorem 15.2.41 and their counterparts 72 
and M 2 from Corollary 15.2.71 Unfortunately the bounds on t and n or t, n 
and m are very complicated functions of these parameters subject to users 
choice and finding optimal values analytically seems impossible. Also, in our 
experience, small changes in these quantities usually result in dramatically 
different bounds. 

Similarly as burn-in bounds in [Jones fc Hobert 2004] . final bounds for 
(e — a)— approximation also strongly depend on the hyperparameter setting 
and the observed data. 

Thus we provide appropriate R functions for approximating optimal bonds 
on the simulation parameters. This functions are available on http: / / akson.sgh.waw.pl/~klatus/ 

5.7 Concluding Remarks 

To our best knowledge, in the above setting of an unbounded target func- 
tion / and without assuming uniform ergodicity of the underlying Markov 
chain (which in practice means the state space X is not compact) we de- 
rived first explicit bounds for the total simulation cost required for (e — 
a )~ approximation. These bounds are sometimes feasible and sometimes in- 
feasible on a PC, and probably always exceed the true values by many orders 
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of magnitude. Although 10 iterations in our Toy Example takes about 1 
minute on a standard PC, sampling more realistic chains will take more time 
and the bound will be even more conservative. 

However, the message of the Chapter is a very positive one: the current 
theoretical knowledge of Markov chains has reached the stage when for many 
MCMC algorithms of practical relevance applied to difficult problems, i.e. 
estimating expectations of unbounded functions, we are able to provide a 
rigorous, nonasymptotic, a priori analysis of of the quality of estimation. This 
is much more then the often used in practice visual assessment of convergence 
, more sophisticated a posteriori convergence diagnostics, bounding only burn 
in time or even using asymptotic confidence intervals. 

We also notice the following: 

• The leading term in the bound for n is b = nV ^ v (1 + 2 j^^ 2 ) (where 
we took p = r = 2 for simplicity). TrV\f 2 \v should be of the order 
of Var-xf, thus this term is inevitable. e~ 2 results from Chebyshev's 
inequality, since we proceed by bounding the mean square error, or 1 
can be reduced to log(a _1 ) for small a by Lemma 15.4.31 and Algorithm 
15.4.41 which in fact results in an exponential inequality. The last term 
1 + 2 1 M272 is of the same order as a general bound for the ratio of the 

1-72 ° 

asymptotic variance and the stationary variance, under drift condition 
and without reversibility as indicated by Theorem 15.3.41 Thus it also 
seems to be inevitable. However we acknowledge this bound seems to be 
very poor due to the present form of V— uniform ergodicity constants. 

• The term 1 + ^^ is the bottleneck of the approach. Here good bounds 
on 7 and the somewhat disregarded M{pf) are equally important. Im- 
provements in Baxendale-type convergence bounds may lead to dra- 
matic improvement of the bounds on the total simulation cost (e.g. by 
applying the preliminary results of [ Bednorz 20 08J). 

• Improvements of drift parameters (i.e. establishing better drift func- 
tions and minorization conditions) imply significant improvement of 
the convergence bounds in Baxendale's Theorem. 

• The drift conditions we used as well as the Baxendale's theorem are far 
from optimal and subject to improvement. 

• We applied the theoretical results to the toy example of Section 15.51 
where the drift and minorization conditions are available without much 
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effort and to the Hierarchical Random Effects Model with drift and 
minorization conditions established in [Jones fc Hobert 2004) . Even 
more general models are feasible in this setting, in particular in the 
recent paper [lj Johnson and Jones established drift and minorization 
conditions for a bayesian hierarchical version of a general linear mixed 
model. 

• Establishing drift conditions might be difficult. A good first try may be 
V(x) proportional to 7r(x)~ 1//2 or to some suitable quadratic function. 

5.8 Appendix - Formulas for p and M 

In the sequel the term atomic case and nonatomic case refers to $ — 1 and 
/3 < 1 respectively. If (3 < 1, define 



5.8.1 Formulas for general operators 

For /3 > 0, R > 1 and L > 1, let Ri = Ri(f3,R,L) be the unique solution 
r G (1, R) of the equation 




1 + ( log -j) /(log A x ), otherwise. 



if v{C) = 1, 

if u(C) + j cc Vdv < K, 



Then let 





r(log(-R/r)) 2 



r — 1 



e 2 (3{R- 1) 
8(i-l) 



and for 1 < r < define 



Ki{r,P,R,L) 



2(3 + 2(\ogN)(\og(R/r))- 1 - 8Ne~ 2 (r - l)r- 1 (log( J R/r)) 
(r -1)\J3- 8Ne~ 2 (r - ljr-^log^/r))- 2 ] 



-2 



where iV = (L — 1)/(R — 1). 



81 



For the atomic case we have p = 1/R\{(3, X l , A and for p < 7 < 1, 

M = max(A ' * 7 A/7) + ^ ( f " A /, 7) ^ ( 7 - \ /3, A- 1 , A" 1 AT) 
7 — A 7(7 — A) 

(A--A/ 7 )max(A > A--A) A(flT-l) 

( 7 -A)(l-A) + ( 7 _A)(1-A)' l °- 5J 

For the nonatomic case let i? = argmaxK^^ Ri((3, R, L(R)). Then we have 
p = -R, L(_R)) and for p < 7 < 1, 

M = 7-" 2 ~ 1 (^7-A) x f pmax(\,K-\) | (l-|9)(r 



-1 



( 7 -A)[l-(l-/3) 7 - Q1 ] 2 V 1 -A 7 
| 7 -^A(AT-l) + KiKy - A - /% - A)] 



#2 



;i - A) (7 - A)[l - (1 - 7 2 (7 - A)[l - (1 - Ph-^} 

+ K i i*x)a-~ 1 ) ) ( (7 ~ Q2 " 1} + (1 ~ ~ p)il ~ ai - • (5 - 49) 

5.8.2 Formulas for self-adjoint operators 

A Markov chain is said to be reversible with respect to ti if f x Pf(x)g(x)ir(dx) = 
fx f ( X )P Qip^^i.dx) for all /, g G L 2 (n). For reversible Markov chains the fol- 
lowing tighter bounds are available. 
For the atomic case define 

min{A-\r s }, if K > X + 2/3, 
A -1 , if K < X + 2/3, 

where r s is the unique solution of 1 + 2/3r = r 1+ ( logi ^( logA ). Then p = R^ 1 
and for p < 7 < 1 take M as in (15.481 ) with K^" 1 , (3, A -1 , A _1 K) replaced 
by K 2 = 1 + 1/(7 -p). 

For the nonatomic case let 

r„ if L(i2o) > 1 + 2(3R , 
R , if L(i? ) < 1 + 2pR , 

where r s is the unique solution of 1 + 2(3r = L(r). Then p = i?2 1 and 
for p < 7 < 1 take M as in ( l5~49ft with K^' 1 , (3 , R, L(R)) replaced by 

K 2 = l + y/$/(y-p). 



Ri 
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5.8.3 Formulas for self-adjoint positive operators 

A Markov chain is said to be positive if f x Pf(x)f(x)ir(dx) > for every 
/ G L 2 (n). For reversible and positive markov chains take M's as in Section 
15.8.21 with p = A in the atomic case and p = Rq 1 in the nonatomic case. 
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Chapter 6 



Convergence Results for Adaptive 
Monte Carlo 



Ergodicity results for adaptive Monte Carlo algorithms usually assume time- 
stability of transition kernels. On the other hand, a large class of time- 
inhomogeneous Markov Chains is ergodic. This suggests existence of adap- 
tive MC algorithms which fail to satisfy the time-stability condition but are 
still ergodic. We present a modification of Atchade-Rosenthal ergodicity 
Theorems (3.1 and 3.2 in [Atchade k. Rosenthal 20 05]) that does not assume 
time- stability of transition kernels. We use a weaker path- stability condition 
instead, that results from time- stability condition by the triangle inequality. 

6.1 Introduction 

As before, we deal with computation of analytically intractable integral 



For computational efficiency of the Markov chain Monte Carlo approach, 
the simulated Markov chain should converge to its stationary distribution 
reasonably quickly. This can sometimes be achieved by careful design of 
the transition kernel P of the chain, on the basis of a detailed preliminary 
analysis of it. Intuitively, the more features of ir are known, the better P can 
be designed. So a non-Markovian approach might be to allow the transition 
kernel of the simulated stochastic process (X n ) n > to adapt whenever new 
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features of ti are encountered during the process run. Simulations show that 
this approach can indeed outperform algorithms based on classical ideas. For 
numerous examples and an insight of how to tune the transition kernel "on the 
fly" see [Roberts fc Rosenthal 2006] and references therein. However, since 
in this case (X n ) n >o is not a Markov chain any more, it may fail to converge 
to the expected asymptotic distribution even if each participating transition 
kernel is ergodic and has the same stationary distribution. A simple but 
nonintuitive example is given in Section 16.21 Difficulty to obtain general 
ergodicity results appears to be the main problem in adaptive Monte Carlo. 

For versions of adaptive MC and related work we refer to e.g. [Fishman 1 996J . 
[Evans 1991] . [Gelfand fc Sahu 1994] . In more recent papers [Gilks at al. 1 998] 
showed adaptation of the transition kernel can be performed (without damag- 
ing the ergodicity of the algorithm) on regeneration times. The idea of adap- 
tive MC through regeneration was then investigated in [Brockwell fc Kadane 2005] 
and Sahu fc Zhigljavsy 2003| . Convergence results in fairly general setting 
have been derived in [Haario et al. 200T] which was followed by refined the- 
orems in [Atchade fc Rosenthal 2005] and a discrete state space version of 
those results presented in [Kohn fc Nott 2005] . 

In each of the above mentioned papers ergodicity results either on regen- 
eration times, or fit within the so called diminishing adaptation framework 
and assume the time-stability condition for transition kernels. Yet the exis- 
tence of ergodic inhomogeneous Markov chains suggests the time-stability of 
transition kernels is not necessary for ergodicity of adaptive MC algorithms. 
After introductory examples in Section 16.21 in Section 16.31 we give ergodicity 
theorems that use a weaker path-stability condition, which results from the 
time- stability condition by triangle inequality. However we have to pay the 
price for it and formulate the uniform ergodicity condition in the time in- 
homogeneous setting, which makes it more complicated then in the original 
Atchade and Rosenthal's theorems. In Section 16.41 we prove the main result 
of this Chapter. 



6.2 One Intuitive and One Not- so- Intuitive Ex- 
ample 

We begin with a simple example where we briefly analyze two stochastic 
processes using the same two transition matrices. 
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Consider the state space X = {0, 1} and ir, the uniform distribution on 
X. Let 

+ eP\ for some e > 0. 

Note that n is the stationary distribution for both, Pi and P 2 . Let p be 
some probability distribution on {P 1 ,P 2 }. Let P(°\ P^\ P^ 2 \ ... be an iid 
sample from p. In the sequel we will use the convention min0 = oo and 
max0 = — oo. 

Example 6.2.1. Let (X n ) n > be a stochastic process with an initial distri- 
bution po, evolving in step k according to the transition matrix p( k \ (X n ) n > 
is clearly an in-homogeneous Markov Chain and p n (the distribution of X n ) 
converges to the stationary distribution ir: let U n := {k : k < n, P^ = Pi} 
and u n = maxU n . The distribution of X n , given u n ^ — oo is ir, so we have 
the following bound on the total variation distance between p n and n: 

\\p n - n\\tv < P(u n = -oo) n -^? a.s. 

Example 6.2.2. (due to W. Niemiro). Now consider (Y n ) n > with an ini- 
tial distribution qo and an initial transition matrix Qq, evolving for n > 1 
according to the following adaptive rule: 

f Pi if n_i = o 
^ k \ p 2 if r fe _! = i 

Note that after two consecutive 1 (and this occurs with probability at least 
| for any k, k + 1) Y n is trapped in 1 and can escape only with probability 
e. Let q~i = lim n ^ 00 P(F n = 1) and qo = lim n _ >00 P(F„ = 0). Now it is clear, 
that for small e we will have q~i ^> % and the procedure fails to give the 
expected asymptotic distribution. 

Both processes (X n )„> and (Y n ) n > are allowed to use essentially different 
transition matrices in two consecutive steps. But one of them converges to 
the desired distribution n and the other one fails to converge. In our opinion 
it is not the "time stability" condition, that is crucial for convergence of an 
adaptive Monte Carlo algorithm. It is the "path-stability" condition, that 
reads "if the path is similar, the transition kernel should be similar as well". 
Obviously (X n ) n > satisfies this condition and (Y n ) n > does not. 

In the following section we will try to formalize this intuition. 



Pi 



1/2 1/2 
1/2 1/2 



and Po 
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6.3 Convergence Results 



We will similarly as in (Atchade fc Rosenthal 2005] analyze a stochastic pro- 



cess (X n ) n > on a general state space X, generated by the following algorithm: 

Algorithm 6.3.1. Assuming we have an initial transition kernel P xo and an 
initial point Xq G X, the algorithm proceeds as follows: 

1. If for time n > we have X n = x and a transition kernel P n ^ , which 

is allowed to depend on the path X n = (X Q , . . . , X n ) G X n+1 ; then 
sample from P n x n {x, ■)■ 

2. Use X n+ \ = (Xq, . . . , X n+ i) to build a new transition kernel P n ^ i to 
be used at time n + 1. 

For (X n ) n > generated by Algorithm 16.3. II we shall write P M to denote its 
distribution on (X 00 ,^ 00 ) when X ~ /i, and P M to denote the expectation 
with respect to P M . If fi = 5 X , we usually write E x and P x instead of P M and 
P M . By P^ n we will denote the marginal distribution of X n induced by P M , 
thus P^n is a probability measure on X . To denote two trajectories of length 
n + k+1, that have a common initial part of length n + 1 and then split, we 
will write (x n ,y k ) and (x n ,y' k ). 

We will prove ergodicity theorems similar to Theorem 3.1 and 3.2 in 
[Atchade fc Rosenthal 2005] . but under modified assumptions. 

Assumption 6.3.2. There exist a measurable function V : X — > [1, oo) and 

real number sequences (r n ), (a n ), (R n ), such that (r„), (R n ) — > as n — > oo 
and: 

A.l (uniform ergodicity) For all j > 1, n > 0, x G X and x n G X n+l , there 
exists y'j = (y[, . . . , y'j) and < I < j — 1 such that 

3-1 

I Yl P ^>(*»>vi,---,vi)( x > ') ~ 7r n+i,(i„^,...,j / ;)(-) < RjV(x). (6.1) 

i=0 

A. 2 (path- stability) For all x G X,x n G X n+1 , there exists y' k G A? fc , smc/i 
t/ia^ x n and y' k satisfy ( (#. 1\) with j = k and for all y k G X k , 

\\P n +k,(x n ,y k )(x, •) - Pn+k,(x n ,y>)(x, -)L < ^lT„a fc 1/(x). (6.2) 
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A. 3 For allxe X,x n G X n+1 ,y k e X k , 

\\n n +k,(x n ,y k ) - Kn,x n \\ v < K 2 r n a k . (6.3) 

A. 4 For all n>\, 

J V 2 {x n )P^ n {dx n ) = (6.4) 

dx n ) . . . P Q ,s (a;o, etei) < if 3 \/ 2 (x ) 

and 

supvr ni£n (V r ) < oo. (6.5) 

n,x„ 

A. 5 For any finite constants C\ y c-i, define 

B{c ll c 2 ,n) := min (ci0 fc r n _ fc + c 2 R k ), 

l<k<n 

where (j) n = X/k=i a &- Assume that B(c\, c 2 , n) = 0(-^) for some e > 0. 

Under these assumptions we will prove two ergodicity theorems: 

Theorem 6.3.3. Let (X n ) n > be the stochastic process generated by Algo- 
rithm HUTU with X = xq. Under A.1-A.4 there exist constants k\,k 2 < 00 
such that for any measurable function f : X — > R with \ f\ < V , 

\E X0 (f(X n )-n n!jtn (f))\ < B{k u k2,n)V(xo). (6.6) 

Theorem 6.3.4. Under A.1-A.5, for any measurable function f : X — > R 
and \ f\ < V , for any starting point xo G X, 

^ n— 1 

- ^2 (f( x d ~ ^hxiif)) °> «5 n ^ 00, P x . - a.s. (6.7) 

n i=0 

Remark 6.3.5. 1. If 7r n in = it, as it usually occurs in Monte Carlo setting 
(•7T is the invariant target distribution), then Theorem 16.3.31 gives a 
bound on the rate of convergence of the distribution of X n to n and 
Theorem 16.3.41 provides a law of large numbers type result. 
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2. In this typical case (ir n >Sbn = 7r) the theory of inhomogeneous Markov 
chains can be applied to check Assumption A.l (compare |Douc et al. 2003) ). 

3. In particular this theorems can be applied in case when n n ^ n = vr and 
Pn,x n — £7T i f° r some e > 0, as considered in |Kohn &: Nott 20 05J. 

4. Assumptions used here differ from those in [Atchade fc Rosenthal 2005] . 
where A.l and A. 2 are as follows: 

A.l' (uniform ergodicity) For all j > 0, n > 0, x G X and x n G X n+1 , 

\\PU.M-^{')\\ v <RjV{x). 

A. 2' (time-stability) For all x G X, x n G A' n+1 , y fc G A"* 1 , 

\\Pn+k,(x n ,y k )(x, •) - ^n,a-„(^, 0||v - K l T n a kV(x). 

The path-stability condition results from the time-stability condition 
by the triangle inequality, so assumption A. 2 presented here is weaker. 
Assumptions A.l here and A.l' in [Atcha de fc Rosenthal 2005] are in- 
comparable. \\P% - (a;, •) — 7r n) 5 n (-)||v does not have to converge even if 
A. 1-5 hold. It involves some computation, similar to this in the proof 
of Lemma 16.4.11 to show A.l', A. 2' together with A. 3-4 imply 

j-i 

|| Y[Pn+i,(xn,y' v -,v'i)( X ^) - 7r n+^(in, J /i,...,y i ')(') y ^ B ( k l, h, j)V (x) , 

8=0 



so if additionally A. 5 holds, 

i-i 

Pn4-i.(x„.v', v')( x i ■) — T^n+l.(x„.v', uO(") 

v n £ 



i- 1 ^ i 



8=0 



However our version is more complicated and might turn out to be 
difficult to check even if 7r n ,x n = 71 ■ 

5. Path-stability instead of time-stability condition enables to apply this 
ergodicity theorems to Monte Carlo algorithms that are inhomogeneous 
in their nature, like simulated annealing. In other words we can adapt 
Monte Carlo methods based on inhomogeneous Markov chains as well. 
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6. Finally, the theorem handles our introductory toy examples i.e. (X n ) n > 
that converges to the desired distribution satisfies A.1-A.4 (but does 
not satisfy A. 2' in [Atchade fc Rosenthal 2005] ). (Y n ) n > that fails to 
converge, fails to satisfy assumption A. 2 as well. 



6.4 Proofs 



We now proceed to prove theorems from Section 16.31 The proof follows 
closely Atchade and Rosenthal [Atchade fc Rosenthal 20 05J. Crucial point 
of the proof is Lemma 16.4. 1L Once Lemma 16.4.11 is shown under our modi- 
fied assumptions, we derive Theorems 16.3.31 and 16.3.41 in essentially identical 
manner as in [Atchade &: Rosenthal 20 05]. This part of the proof is purely 
expository and presented here for the sake of completeness. 
Let (^ r n )^__ 00 be a filtration defined by: 



J n 



{,tt} ifn<0 
a{X ,...,X n ) ifn>0 



(6.8) 



and 9 k ,x k ( x ) f( x ) - **,jf k (/)- 

Lemma 6.4.1. Assume A.1-A.4 hold. Then there are some constants < 
ki, &2 < 00 such that for any n > 0, j > 1 and any measurable function f 
with \ f\ <V, we have: 



E x {9n+j,X n+J ( X n+j)\Fn) < B (fci, k 2 , j)V (x ) . 



(6.9) 



The proof of Lemma 16.4.11 is given later in this section. We start with 
Theorems 16.3.31 and I6.3.4L 



Proof of Theorem \6. 3.3\. Let n = in Lemma l6.4.1[ We obtain the following: 



l-r, (//,v (.V,)^.) n = \E X0 {f{X 3 ) - tt^ (/)) I < B(h,k 2 ,j)V(xo), 



for all \ f\ < V, which is Theorem 16.3.31 



□ 
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Proof of Theorem \6. 3.4\ To prove Theorem 16.3.41 we will use the theory of 
mixingales. Theorem 16.5.21 used here is presented in Appendix. Let 



Y n := f(X n ) -7r n j n (f) -E X0 {f(X n ) -7T n> ^(/)). 
The proof will proceed according to the following plan: 
1. Show that 

E X0 (f(X n ) - 7T jf n (/)) -»• as n -> oo. 



(6.10) 



(6.11) 



2. Show that (Y n ) n >o is a mixingale of size — | and use Theorem 16.5.21 to 
conclude that 



n— 1 
i=0 



as n 



oo P x . a.s. 



(6.12) 



3. The foregoing results in 



1 n— 1 
i=0 



as n 



oo P ao a.s. 



(6.13) 



This states Theorem 16.3.41 



To see that (16.111) holds, it is enough to recall Theorem 16.3.31 and As- 
sumption A. 5. 

To prove (16 .12ft consider first condition (16.301) . Since the filtration is 
defined by (16.81) . we have E(Y n \J 7 n+ j) = Y n and (16.301) is satisfied for any 
positive number sequences (c n ) and (ip n ). 

Condition (|6.29f) is obviously satisfied for j > n, for any positive number 
sequences (c n ) and (ip n ) as well, since EY n = 0. For the case j < n we will 
use Lemma 16.4.11 



n-JJ 1 1 2 



Ejg n ,x n (X n ) - (E X0 (g xS X n)))\^ 



n-j 



< 



E *0 [9n,X n ( X n)\^n- 



+ 



+ 



E X0 ( E XQ (g n ^ Xn (X n )) \T, 



n-j 



Ex \9n,X n ( X n)\F; n-j 

< B(ki, k 2 J)V(x ) + B(k u k 2 , n)V(x 
= 0(j- £ ) + 0(jC e ) = 0(r £ ) 



Ex (g n ,X n ( X n)\^ D 
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Now we set in (16.291) c n = 1 and take appropriate ipj, such that ipj = 
0(j~ e ). Hence {Y n ) n > is a mixingale of size — |. Since ^ = 
and — 1 < min{ — |, | — 1}, we can apply Theorem 16.5.21 and conclude that 
\ YhZq Y i -> as n -> oo P Xo a.s. 

Combining (16.111) and (16.121) . we get ( 16 . 1 3H by an elementary argument. 

□ 



Now we proceed to prove Lemma 16.4.11 



Proof of Lemma 6.4-1. Note that 



^xAd-nxJ = K n x(f - n x (/)) = P xo a.s. (6.14) 



The idea of the proof is to split the quantity ^^(s'n+^x^ ^n+j)!-^") n>> 
into two terms, say A and B and bound them using Assumptions A.l, A. 2 
and A. 3. 

Denote by (x n , yj) = (x n , y%, . . . , yj) a trajectory of length n + j. Accord- 
ing to this notation we will usually write yi for Given (X , . . . , X n ) = x n 
we have 

Ex \9n,X n {Xn+j)\X n = X^j = 

= J 9n,Xn{yj) P n+j-h^n,yu-,yj-l)(yj-h d Vj) ■ ■ ■ Pn,x n (x n , dj/i) 

= r)j-x{x n ) + 

+ y g^Xniy^Pn+j-Ux^y'^iyj-l, dy j )P n+J _ 2 ,(x n ,y j . 2 )(yj-2, dyj-t) . . . 
■ ■ ■ Pn,x n i%n i dy\ ) , 

where y', — (y[, . . . , y'j) is as in Assumption A. 2, and 

T)j-i(x n ) = 

= J 9n,x n (yj) (•Pn+i-l,(2„ ) fe_ 1 )(%--l, dyj) - Pn+j-l^y'^iVj-U 

(Vj-2, dyj-i) Pn,x n {Xn, dyi) . 



92 



By exchanging transition kernels for all coordinates, we get: 



3-1 j'-i 

E xo(g n ,X n {X n +j)\X n = X n) = ^Vk{ x n) + ( J[ Pn+i,(x n ,y'd) 9n,x n ( x n) , 

k=l i=0 

(6.15) 

where 



Vk(x r . 



/ ( II 9n,x n (yk+l) 
J i=k+l 



Pn+k,(x n ,y k )(yk, dyk+l) - Pn+k,(x n ,y' k )(yk, d-Vk+l, 

dyi). 



(6.16) 



Consider the second term of the right hand side of (16 . 1 5H : 



J- 1 



Pn+i,(S n ,y'i)j 9n,x n ( x n) 
3-1 



i=0 

X\Pn+tA^y[))f( X n) ~ Kn,x n (f) 
i=0 

J- 1 . 

\\Pn+i,{x n ,y[)jf{Xn) ~ ^n+l,{x n ,y[){f) 
i=0 
3-1 

— IT Pn+i,(xn,yi)( x n, ') ~~ 7r n+Z,(£„,y|)(") 
i=0 

< R j V(x n ) + K 2 T n aj, 



Kn+lXx n ,y[)(.f) - Kn,x n (f) 



(6.17) 



where the inequalities result from Assumptions A.l and A. 3. 

We will now bound the first term of the right hand side of ( 16.151) . Note 
that since g n ,x n {yk+i) = f(Vk+i)-^n^ n (f) and n n ^ n (f) given x n is some real 
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number, we obtain: 



3-1 



Y\ Pn+i,(x n ,yl)J 9n,x n (Uk+l) ~ 
i=k+l 

3-1 

= ( II • P ri+t,(5„ ) ^)J/(yfe+l> 



7T n ,s n (/) (6.18) 



i=fc+l 



and 



Kn,x n (f) [Pn+k,(x n ,y k )(Uk, dy k+X ) - P n +k,{x ni y' k ){Vki dVk+l) 

P n +k-i,(x n ,yk-i)(yk-udyk) P n<i;n (x n , dyx) = 0.(6.19) 

Hence using (16.181) and (16 . 1 91) we get 



/ ( II Pn+hixn^jfiVk+l) 



i=k+l 

(^P n +k,(x n ,y k )(yk, dy k +i) - P n +k,{x n , y ' k )(yk, dy k +i) 
Pn+k-l,(x n ,y k _ 1 )(y k -i,dy k ) P n dyi). 



(6.20) 



Since for each < / < j — k — 2 and y" we have 



3-1 



I [ Pn+i,(x„,%))f(yk+l) 



i=k+l 



3-1 

Y\_ Pn+i,{x„,y'^)f{yk+l) 
i=k+l 

- ^ri+k+l+l,{x n ,y' k+1 ,y'i)(f) + 7r n+A;+l+Z,(x„,^ +1 ,^') (/) 
3-1 

— II Pn+i,{x n ;yi{yk+l, ') ~ Kn+k+l+l,(x n ,y' k+1 ,y' l '){') 
i=k+l 

+ Kn+fe+l+J,(s n ,j/^ +1 ,^')(/)l> 



V 

(6.21) 



we can apply A.l and write an analogous equality to (16.190 for 7rn+fc+i+J,(a n ,fifc x,Sj")(<jO 
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resulting from A.l to get: 



\Vk(x n )\ < sup / Rj-i-kV (y k+1 )I(y k+ i) 

I:X-*{-l,l} J 

[Pn+k,(x„,y k ){yk, dy k +i) - P n +k,(x n ,y' k )(yk, dy k+1 )^ 
Pn+k )(y k -i, dy k ) P n ,x n {x n , dyi) 

< R,^ f Kl r n a k V( Vk ) 

)(yk-i, dy k ) P n ,x n { x n, dyi) 

< r r n a k E X0 {V(X n+k ) \X n = x n ) . (6.22) 

Where the second inequality results from path-stability condition A. 2 and ro 
is some finite constant, since K\ < oo and (R n ) — > as n — > oo. 

Putting flBTfl) and (11x221 together in (16151) . we get: 

\ E x (g n ,X n ( X n+j)\3 : ' n) | < 

3-1 

< RjV(X n ) + K 2 r naj + r r n akE X0 (V{X n+k ) \T n ) . (6.23) 

k=l 

By Assumption A. 3 we have 

| E Xo (g n+j> x n+j (X n+j ) \F n ) | < | E Xo (g Ht x n ( x n+j) \Fn)\ 

+ E x (\n n+j>Xn+ . {f) ~ TT nt x n (/) | 

< 1^0(^(^)1^)1+^2^, (6.24) 

We now combine (16.231 ) and ( 16.24ft to obtain the first inequality of the 
following bound: 
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\\E X0 (g n+j ,x n+j { X n +j )\^n)\\ 2 < R j \\V(X n )\\ 2 + 2K 2 r n a j + 

i-i 

J2^k\\E Xo (V(X n+k )\^ n )\\ 2 
k=l 

<R j \\V{X n )\\ 2 + 

j 

+ max{r , 2K 2 }r n ^ a fc ||^(X n+fc ) || 2 
k=l 

<R j ^/K~ 3 V(X ) + 

j 

+ max{r , 2K 2 }r n ^ a k ^/K^V(X ) 

k=l 

<V{x ){r z R j + r 2 T n (j) j ), (6.25) 
where we use Assumption A. 4 and apply 

\\E X0 (V{X n+k )\F n )\\ 2 = {E[(E X0 (V(X n+k )\T n )) 2 }} 1/2 

< {E{E X0 {V 2 {X n+k )\T n ))} l/2 
= {EV 2 {X n+k )} 1 ' 2 = \\V{X n+k )\\ 2 

The constants in ( 16.2511 are defined as r 3 := \/Ks, r 2 := Yn.ax.{ro 1 2K 2 \yfK3 
and 4>j := Yli=i a k- 

Since (J r n )'^L_ 00 is a filtration, T n C jF n+J _ fc , for k = 1, . . . , j and therefore 

Exq (9 n +j,X n+j ( X n+j)\^ n) = E XQ (E Xq (g n+jyXn+j ( X n+j) \F n+j-k) \^ri) ■ 

This implies 

{ E x {9 n +j,X n+j ( X n+j)\^ n)} < E Xo [{E xo (g n+j: x n+J ( X n+j)\^n+j-k) } \F r, 



And therefore 

E x {9„+j,X n+ j ( X n+j) \F n) 



< 



E XQ (g n+ j t x n+j ( X n+j)\^ n+j-k) 



We now apply ( 16.251) to the right hand side of (16.261) and get: 
E xo {g n+j;Xn+j ( x n+j) \F n ) < V{x ) (r 3 R k + r 2 r n+i _ fc fe ) 



(6.26) 



(6.27) 
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Finally, since (16.271) holds for every k = 1, . . . ,j, we can take the minimum: 

E * {9 n +j,x n+] ( X n +j ) \F n ) ^ < V{x ) mm {r 3 R k + r 2 T n+j _ k <f) k }. (6.28) 

Obviously V(x ) mini< fc <j {r 3 R k + r 2 r n+i _ fc fc } < V(x )B(k 1 , k 2 ,j) for some 
constants k\ and &2, which completes the proof of the lemma. □ 



Hence the proof of Theorems (|6.3.3I) and (16.3.41) is complete as well. 



6.5 Appendix - Mixingales 

We present here a version of Strong Law of Large Numbers for mixingales that 
is used to conclude the proof of Theorem l6.3.4[ Theorem 16.5.21 presented here 



is a version of Corollary 2.1 in Davidson fe Jong 1997| . For an introduction 



to mixingales see the books |Hall fc Heyde 1980| or [Davidson 1 994]. 



Let (Z n ) n > be a real-valued stochastic process on some probability space 

(fi,.F,P). Assume (Z n ) is L 2 -bounded, i.e. \\Z n \\ 2 = { J Zl(uj)dP(uj)} 1/2 < 
oo for all n > 0. Let (J r n )^ = _ oc be a filtration. 

Definition 6.5.1. The process (Z n ) n > is a L 2 -mixingale with respect to 
filtration (^ 7 n )^L_ 00 if there exist real number sequences (c n ) and (ip n ), ip„ — > 
as j ; — >• oo, such that for all n > and all j > 0, 



\E{Z n \r n ^)\\ 2 <c n ^ (6.29) 



and 



\\Z n - E(Z n \J z n+ j)\\ 2 < c n xp j+ i. (6.30) 

If for some A > 0, ip n = 0{n~ x ~ e ) for some e > 0, we say that mixingale 
Z n is of size —A. 

Theorem 6.5.2. Let (Z n ) be a L 2 -mixingale of size —A. If ^ — 0{n a ), 
where a < min{ — |, A — 1}, then - J2i=o —> a.s. 
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